Integrations
Firecrawl (Crawler / Scraper)

Firecrawl Integration

No sections found for this integration
The integration documentation may not have the expected structure

Overview

Firecrawl is a robust tool designed to transform websites into LLM-ready data by leveraging its Crawler and Scraper functionalities.

  • Crawler: Automatically extracts data from websites by crawling web pages and following links. It recursively traverses sites (starting from a URL, using sitemaps when available), handles dynamic content rendered with JavaScript, and supports sync or async modes with webhook notifications.
  • Scraper: Extracts targeted content from specific web pages using customizable rules. It supports single-URL or batch scraping, main-content-only extraction, tag inclusion/exclusion, TLS verification options, device emulation, and adjustable page-load timing.

Whether you need to map full site structures or extract specific data from pages, Firecrawl provides a seamless and customizable solution.

⚠️

Firecrawl nodes can now be used directly inside sync or async nodes. You no longer need to create a separate flow for crawling or scraping.

Features

Key Functionalities (Crawler + Scraper)

Crawler

  • Comprehensive Crawling: Recursively traverses websites, identifies and accesses subpages, uses sitemaps when available, and follows links for thorough data collection.
  • Dynamic Content Handling: Manages JavaScript-rendered content for full extraction from accessible subpages.
  • Sync & Async Modes: Run crawls synchronously or asynchronously with webhook callbacks for completion, page, started, and failed events.

Scraper

  • Customizable URL Scraping: Specify exact URLs to scrape for targeted data extraction.
  • Selective Content: Scrape only main content or include/exclude specific HTML tags.
  • TLS Verification: Option to skip TLS verification for broader website compatibility.
  • Device Emulation: Emulate mobile devices for mobile-optimized pages.
  • Adjustable Load Timing: Set wait time (ms) for dynamic content to load.

Shared

  • Agent Workflows: Async Agent, Sync Agent, and Check Agent for intelligent web data extraction.
  • Webhooks: Real-time updates for crawl/scrape completion and events.

Benefits

  • Reliability: Handles proxies, rate limits, and anti-scraping measures for consistent extraction (Crawler).
  • Efficiency: Manages requests to minimize bandwidth and avoid detection (Crawler).
  • Precision: Targeted scraping with tag and scope control (Scraper).
  • Flexibility: Mobile emulation and TLS options for diverse sites (Scraper).
  • Generate structured, LLM-compatible data; customize inclusion/exclusion; handle static and dynamic content.

Prerequisites

Before using Firecrawl, ensure the following:

  • A valid Firecrawl API Key (opens in a new tab).
  • Access to the Firecrawl service host URL.
  • Properly configured credentials for Firecrawl.
  • A webhook endpoint for receiving notifications (required for the crawler).
⚠️

For Self Hosting,If the connection fails, whitelist the following IPs: https://www.cloudflare.com/ips/ (opens in a new tab)

Setup

Step 1: Obtain API Credentials

  1. Register on Firecrawl (opens in a new tab).
  2. Generate an API key from your account dashboard.
  3. Note the Host URL and Webhook Endpoint.

Step 2: Configure Firecrawl Credentials

Use the following format to set up your credentials:

Key NameDescriptionExample Value
Credential NameName to identify this set of credentialsmy-firecrawl-creds
Firecrawl API KeyAuthentication key for accessing Firecrawl servicesfc_api_xxxxxxxxxxxxx
HostBase URL where Firecrawl service is hostedhttps://api.firecrawl.dev

Configuration Reference

Sync Mode Output Format

Batched Mode

{
  "success": true,
  "status": "completed",
  "completed": 48,
  "total": 50,
  "creditsUsed": 13,
  "expiresAt": "2025-08-01T12:30:00.000Z",
  "data": [
    {
      "url": "https://example.com/page-1",
      "content": "Lorem ipsum dolor sit amet...",
      "metadata": {
        "title": "Page 1 Title",
        "description": "This is a sample description.",
        "language": "en"
      }
    },
    {
      "url": "https://example.com/page-2",
      "content": "Second page scraped content...",
      "metadata": {
        "title": "Page 2 Title",
        "description": "Another sample description.",
        "language": "en"
      }
    }
    // ... more pages
  ]
}

Single Mode

{
  "success": true,
  "status": "completed",
  "completed": 1,
  "total": 2,
  "creditsUsed": 1,
  "expiresAt": "2025-08-02T12:30:00.000Z",
  "data": [
    {
      "url": "https://example.com/page-1",
      "content": "Lorem ipsum dolor sit amet...",
      "metadata": {
        "title": "Page 1 Title",
        "description": "This is a sample description.",
        "language": "en"
      }
    }
  ]
}

Async Mode Output Format

{
  "success": true,
  "id": "8***************************7",
  "url": "https://api.firecrawl.dev/v1/crawl/8***************************7"
}

Step 3: Set Up the Mode (Crawler)

For the Crawler, choose sync or async mode. In async mode, configure a webhook endpoint to receive crawl results (create a Webhook flow/URL in Lamatic to receive crawl updates and results). Async options include:

ParameterDescriptionExample Value
Callback WebhookURL to receive notifications about crawl completionhttps://example.com/webhook
Webhook HeadersHeaders sent to the webhook{'Content-Type':application/json'}
Webhook MetadataMetadata sent to the webhook{'status':'{{codeNode_540.status}}'}
Webhook EventsEvents to send: completed, failed, page, started["completed", "failed", "page", "started"]

Crawler Configuration (Single)

ParameterDescriptionExample Value
Credential NameSelect previously saved credentialsmy-firecrawl-creds
URLStarting point URL for the crawlerhttps://example.com
Exclude PathURL patterns to exclude from the crawl"admin/*", "private/*"
Include PathURL patterns to include in the crawl"blog/*", "products/*"
Crawl DepthMaximum depth to crawl relative to the entered URL3
Crawl LimitMaximum number of pages to crawl1000
Crawl Sub PagesToggle to enable or disable crawling sub pagestrue
Max Discovery DepthMax depth for discovering new URLs during the crawl5
Ignore SitemapIgnore the sitemap.xml file for crawlingfalse
Allow Backward LinksAllow crawling backward links (e.g., blog post → homepage)true
Allow External LinksAllow crawling external links (e.g., links to other domains)false
Ignore Query ParametersIgnore specific query parameters in URLsfalse
DelayDelay between requests to avoid overloading server (in seconds)2

Batch Crawler Configuration (Async / Sync)

ParameterDescriptionExample Value
Credential NameSelect previously saved credentialsmy-firecrawl-creds
URL ListList of starting URLs to crawl[ "https://x.com", "https://y.com" ]
Include PathPaths to include during crawl"blog/*"
Exclude PathPaths to exclude during crawl"admin/*"
Crawl DepthDepth to crawl for each URL3
Crawl LimitMax pages per domain500
Max Discovery DepthHow far discovered links can go4
Allow External LinksWhether to crawl external domainsfalse
Allow Backward LinksWhether to revisit previous pagestrue
Crawl Sub PagesEnable sub-page traversaltrue
Ignore SitemapSkip sitemap.xmlfalse
DelayThrottle request delay in seconds1
Callback WebhookURL to receive notifications about crawl completionhttps://example.com/webhook
Webhook HeadersHeaders to be sent to the webhook{'Content-Type:application/json'}
Webhook MetadataMetadata to be sent to the webhook{'status':'{{codeNode_540.status}}'}
Webhook EventsA multiselect list of events to be sent to the webhook["completed", "failed", "page", "started"]

Scraper Configuration (Single)

ParameterDescriptionExample Value
Credential NameSelect previously saved credentialsmy-firecrawl-creds
URLTarget URL to scrapehttps://example.com/page
Main ContentExtract only main content (exclude header/footer/nav)true
Skip TLS VerificationBypass SSL certificate validationfalse
Include TagsHTML tags to include in extractionp, h1, h2, article
Exclude TagsHTML tags to exclude from extractionnav, footer, aside
Emulate Mobile DeviceSimulate mobile browser accesstrue
Wait for Page LoadTime to wait for dynamic content (in ms)123

Batch Scraper Configuration (Async)

ParameterDescriptionExample Value
Credential NameSelect previously saved credentialsmy-firecrawl-creds
URL ListList of URLs to scrape in batch[ "https://a.com", "https://b.com" ]
Main ContentExtract only main content from each pagetrue
Skip TLS VerificationIgnore SSL certificate errorsfalse
Include TagsHTML tags to extractp, h1, h2
Exclude TagsHTML tags to exclude from extractionaside, footer
Emulate Mobile DeviceUse mobile browser viewporttrue
Wait for Page LoadDelay for dynamic content to appear (in ms)200
Callback WebhookURL to receive notifications about crawl completionhttps://example.com/webhook
Webhook HeadersHeaders to be sent to the webhook{'Content-Type:application/json'}
Webhook MetadataMetadata to be sent to the webhook{'status':'{{codeNode_540.status}}'}
Webhook EventsA multiselect list of events to be sent to the webhook["completed", "failed", "page", "started"]

Low-Code Examples

Crawler Node

nodes:
  - nodeId: crawlerNode_880
    nodeType: crawlerNode
    nodeName: Crawler
    values:
      credentials: my-firecrawl-creds
      url: https://example.com
      crawlMode: async
      webhook: https://example.com/webhook
      webhookEvents: [completed, failed, page, started]
      crawlSubPages: true
      crawlLimit: 10
      crawlDepth: 1
      excludePath: []
      includePath: []
      maxDiscoveryDepth: 1
      ignoreSitemap: false
      allowBackwardLinks: false
      allowExternalLinks: false
      delay: 0
    needs:
      - triggerNode_1

Scraper Node

nodes:
  - nodeId: scraperNode_680
    nodeType: scraperNode
    nodeName: Scraper
    values:
      credentials: my-firecrawl-creds
      url: https://example.com/page
      onlyMainContent: false
      skipTLsVerification: false
      mobile: false
      waitFor: 123
      includeTags: []
      excludeTags: []
    needs:
      - triggerNode_1

Crawler Event Output (Async Mode)

When using async crawler mode, the node can emit these events to your webhook:

  1. Started: When the crawl begins (job ID, URL).
  2. Page: For each crawled page (URL, extracted data).
  3. Completed: When the crawl finishes (summary, page count, errors).
  4. Failed: When the crawl fails (error message, job ID).

Example – Started:

{
  "success": true,
  "type": "crawl.started",
  "id": "82***********************4",
  "data": [],
  "metadata": {}
}

Scraper Output Schema

  • markdown: Scraped content as Markdown.
  • language: Detected language of the content.
  • referrer: Referrer URL, if applicable.
  • title: Title of the scraped page.
  • scrapeId: Unique identifier for the scrape.
  • sourceURL: URL from which content was scraped.
  • url: Resolved URL of the resource.
  • statusCode: HTTP status code of the scrape.

Example:

{
  "markdown": "# Page content...",
  "language": "en",
  "referrer": "",
  "title": "Example Page",
  "scrapeId": "uuid-here",
  "sourceURL": "https://example.com/page",
  "url": "https://example.com/page",
  "statusCode": 200
}

Map URL Configuration

ParameterDescriptionExample Value
Credential NameSelect previously saved credentialsmy-firecrawl-creds
URLStarting URL to map the structurehttps://example.com
Main ContentExtract only main content from each pagetrue
Skip TLS VerificationIgnore SSL certificate errorsfalse
Include TagsHTML tags to extractp, h1, h2
Exclude TagsHTML tags to exclude from extractionaside, footer
Emulate Mobile DeviceUse mobile browser viewporttrue
Wait for Page LoadDelay for dynamic content to appear (in ms)200

Map URL Output Example

{
  "success": true,
  "links": [
    "https://lamatic.ai/docs",
    "https://lamatic.ai/docs/architecture",
    "https://lamatic.ai/docs/career",
    "https://lamatic.ai/docs/context",
    "https://lamatic.ai/docs/context/vectordb",
    "https://lamatic.ai/docs/context/vectordb/adding-data",
    "https://lamatic.ai/docs/contributing",
    "https://lamatic.ai/docs/demo",
    "https://lamatic.ai/docs/deployments",
    "https://lamatic.ai/docs/deployments/cache"
  ]
}

Agent Workflow Configuration

Firecrawl now supports agent workflows with three execution modes for submitting and monitoring agent tasks.

Agent Modes

  1. Async Agent: Submit agent tasks asynchronously and receive a task ID for later status checking.
  2. Sync Agent: Execute agent tasks synchronously and receive results immediately upon completion.
  3. Check Agent: Check the status of a previously submitted async agent task.

Agent Configuration (Async / Sync)

ParameterDescriptionExample Value
Credential NameSelect previously saved credentialsmy-firecrawl-creds
PromptThe prompt or instruction for the agent to execute"Extract all product prices from the page"
URLsOptional list of URLs to limit the agent’s scope. You can provide this as an array of URLs or as a comma-separated list.[ "https://example.com/products" ]
SchemaJSON schema defining the expected output structure{"price": "string", "name": "string"}
Max CreditMaximum credits to use for the agent task100
Strict URL ConstraintsEnforce strict URL matching rulestrue

Check Agent Configuration

ParameterDescriptionExample Value
Credential NameSelect previously saved credentialsmy-firecrawl-creds
Agent Job IDThe task ID returned from an Async Agent submission"8***************************7"

Troubleshooting

Common Issues

ProblemSolution
Invalid API KeyEnsure the API key is correct and has not expired.
Connection IssuesVerify that the host URL is correct and reachable.
Webhook ErrorsCheck if the webhook endpoint is active and correctly configured.
Crawling ErrorsReview the inclusion/exclusion paths for accuracy.
Dynamic Content Not LoadedIncrease the Wait for Page Load time in the configuration.

Debugging

Was this page useful?

Questions? We're here to help

Subscribe to updates