Methodology
How we verify every festival fact.
Every claim on a FestivalMates festival page — date, venue, capacity, stages, ticket price, transport, camping — traces to a primary source. We re-fetch those sources on a weekly cron and a human reviews before anything publishes. No AI-recalled facts. No "trust me bro."
What counts as a primary source
We treat exactly four categories as primary, in order of trust:
- The festival's official website.
- The festival's official ticketing page (when it's a separate domain — e.g. paylogic, see-tickets).
- The festival's official lineup page or sub-domain.
- Official, verified social channels — Instagram, X, Facebook, TikTok, YouTube — when the festival uses them to announce sale waves, lineup phases, or schedule changes.
What we don't treat as primary: Wikipedia, fan blogs, ticket resellers, Resident Advisor event listings (often stale), DJ Mag / Mixmag (we cross-check, never cite), and — critically — AI training data. LLMs restructure verified facts on this site; they never recall.
The four-stage pipeline
Every fact moves through four stages before it's published:
Stage A — Scrape
A weekly cron pulls each registered source URL, converts the HTML to markdown, and stores the raw content with a hash. No LLM in this stage — just primary-source HTML in, markdown out. If the URL fails, the festival is flagged in the admin queue.
Stage B — Extract
An LLM reads the scraped markdown and extracts structured facts (date, venue, capacity, stages, etc.) WITH per-fact source citations. Facts that can't be cited from the scrape are dropped — no recall, no inference.
Stage C — Write
A second LLM pass writes natural-language editorial copy (Pro Tips, transport, camping, schedule clashes) — but only from facts already extracted in Stage B. The editorial inherits the same source URLs as the underlying facts.
Stage D — Re-verify
A weekly cron re-scrapes every source URL and content-hash-diffs it against the previous snapshot. Changes flag the festival for human review. Editorial older than 90 days without a re-verification is auto-marked stale and hidden from the public surface.
Freshness signals on every festival page
Look for the panel labelled Data & sources or Sources we track at the bottom of any festival page. There are three states:
- Verified todayEditorial published, last human re-verification within 7 days. Gold standard.
- Sources we trackSources registered and re-fetched weekly, but no human-reviewed editorial on this festival yet. Treat lineup and date info as tracked, not verified.
- Verified 3 months agoEditorial is older than the freshness threshold. We're re-running verification.
Known scope and bias (we'll be honest)
We're a small team, and the catalogue reflects that:
- Genre coverage tilts techno / house. The initial 170 festivals lean toward European techno and house circuits. US bass and dubstep festivals (Lost Lands, Bass Canyon, Wakaan, Imagine, Forbidden Kingdom, EDC variants) are underrepresented as of May 2026. Catalogue expansion is the next active track.
- Geographic skew is European. Most festivals are in Belgium, Netherlands, Germany, France, Spain, Italy, Portugal, Croatia, and the UK. Some US, Canada, and Australia. Latin America, Asia, and Africa are largely missing — flag a gap and we'll add it.
- Editorial coverage is partial. 12 of 169 festivals have full editorial copy published as of today. The rest show only registered sources + official lineup data. We're working through the backlog; you can tell which is which from the section header on each festival page.
Spot a wrong fact?
Email hi@festivalmates.com with the festival URL and the specific claim. We'll re-check against the primary source within 48 hours and either correct it, cite a stronger source, or flag the upstream source as unreliable. Correction logs go in the festival's editorial revision history.
We'd rather have no copy than wrong copy. A festival page without a Pro Tips section is fine. A festival page with wrong Pro Tips is reputational damage.