- Add changeTracking to scrapeOptions and Firecrawl flags every page on a site as new, same, changed, or removed — with git or JSON diffs included.
- Pricing pages, doc trees, and product catalogs become a one-call monitoring pipeline.
TL;DR
Firecrawl's /crawl endpoint can now monitor an entire website for changes by adding changeTracking to scrapeOptions.formats. Every URL returned is flagged new, same, changed, or removed, with an optional git-diff (free) or JSON-schema diff (5 credits/page). What used to be a custom pipeline of scrape → hash → state DB → diff renderer is now a single API flag.
What's new
Change tracking originally launched on /scrape in April 2025 (Launch Week III, Day 1). The unlock now is that the same primitive works inside /crawl and /batch/scrape — so you can fire one job at a documentation tree, a competitor pricing site, or a product catalog and get back a per-page change verdict for the whole site. Version 2.6.0 (Nov 2025) made the comparison engine faster and more reliable, and recent 2026 releases added onlyCleanContent to strip nav and ads before diffing.
How it works
You tell Firecrawl to compute change tracking the same way you'd ask for markdown:
POST /v2/crawl
{
"url": "https://example.com",
"limit": 50,
"scrapeOptions": {
"formats": ["markdown", "changeTracking"]
}
}The markdown format must always accompany changeTracking — comparisons run on the markdown body, not raw HTML. Each page in the response gets a changeTracking object with these fields:
changeStatus—new,same,changed, orremovedpreviousScrapeAt— ISO timestamp of the last scrape, ornullon first runvisibility—visible(linked from nav) orhidden(URL works but unlinked)diff— line-level changes when git-diff mode is onjson— structured field comparison when JSON mode is on
Two diff modes
Pick the mode that matches what you actually need to react to:
| Mode | What you get | Cost | Best for |
|---|---|---|---|
| git-diff | Line-by-line text diff with add/del/normal markers + structured chunks | Free | Docs, blog posts, ToS, policy pages |
| json | Schema-extracted fields compared old vs new | 5 credits/page | Pricing, stock, SKU, structured data |
Combine both by passing modes: ["git-diff", "json"] when you want the full text diff and a structured field-level summary in one call.
Use cases that actually move the needle
- Competitor pricing intel — daily crawl of a competitor's pricing page, JSON mode against a
{plan, price, billing_period}schema, alert on any field delta. - Doc-driven RAG hygiene — weekly crawl of API docs, re-embed only pages flagged
changed— massive cost saver versus full re-indexing. - Inventory and SKU tracking — monitor product feeds for in-stock / out-of-stock and price shifts at scale.
- Compliance watch — track ToS, privacy policy, and regulatory page edits with git-diff for an auditable record.
- News and market reports — financial tools that fetch only modified reports, not the whole archive.
Tagging: track the same URL on multiple cadences
The tag parameter lets you keep parallel histories for the same URL. Run an hourly tag for the homepage and a weekly tag for the same page if you want both signal levels — comparisons stay scoped to matching tag, team, URL, and markdown config.
{
"type": "changeTracking",
"modes": ["git-diff", "json"],
"tag": "hourly",
"schema": { "type": "object", "properties": { "price": { "type": "string" } } }
}How it compares
Plenty of tools watch web pages for changes — Visualping, Distill, Hexowatch — but they're built around a UI dashboard. Firecrawl's bet is different: change tracking is a format, not a product. You get the same primitive in the same call you already make for scraping, so it drops into existing RAG pipelines, agents, and ETL jobs without a separate vendor. Compared to rolling your own (markdown hash + state DB + diff renderer), you skip the snapshot store, the diff library, and the schema-extraction LLM call — Firecrawl already does all three.
Limitations & pricing
- Snapshots are persistent and never expire — long gaps between scrapes still produce valid diffs.
- Different
includeTags,excludeTags, oronlyMainContentvalues across runs produce unreliable comparisons. Keep your config stable per tag. - Change tracking requests bypass index caching and ignore
maxAge— expect a fresh fetch every time. - Pricing: basic status + git-diff are free with standard scrape credits; JSON mode is 5 credits per page because it runs LLM extraction.
What's next
Pair the crawl with webhooks (already supported on async jobs) so you receive changed pages as they're processed instead of polling for the final result. Or skip the plumbing entirely and try Firecrawl Observer, the open-source monitoring dashboard built on top of this same API.
Nguồn: Firecrawl docs, Launch Week III announcement, Changelog, Firecrawl on X.

