Firecrawl Integration

No sections found for this integration

The integration documentation may not have the expected structure

Overview

Firecrawl is a robust tool designed to transform websites into LLM-ready data by leveraging its Crawler and Scraper functionalities.

Crawler: Automatically extracts data from websites by crawling web pages and following links. It recursively traverses sites (starting from a URL, using sitemaps when available), handles dynamic content rendered with JavaScript, and supports sync or async modes with webhook notifications.
Scraper: Extracts targeted content from specific web pages using customizable rules. It supports single-URL or batch scraping, main-content-only extraction, tag inclusion/exclusion, TLS verification options, device emulation, and adjustable page-load timing.

Whether you need to map full site structures or extract specific data from pages, Firecrawl provides a seamless and customizable solution.

⚠️

Firecrawl nodes can now be used directly inside sync or async nodes. You no longer need to create a separate flow for crawling or scraping.

Features

Key Functionalities (Crawler + Scraper)

Crawler

Comprehensive Crawling: Recursively traverses websites, identifies and accesses subpages, uses sitemaps when available, and follows links for thorough data collection.
Dynamic Content Handling: Manages JavaScript-rendered content for full extraction from accessible subpages.
Sync & Async Modes: Run crawls synchronously or asynchronously with webhook callbacks for completion, page, started, and failed events.

Scraper

Customizable URL Scraping: Specify exact URLs to scrape for targeted data extraction.
Selective Content: Scrape only main content or include/exclude specific HTML tags.
TLS Verification: Option to skip TLS verification for broader website compatibility.
Device Emulation: Emulate mobile devices for mobile-optimized pages.
Adjustable Load Timing: Set wait time (ms) for dynamic content to load.

Shared

Agent Workflows: Async Agent, Sync Agent, and Check Agent for intelligent web data extraction.
Webhooks: Real-time updates for crawl/scrape completion and events.

Benefits

Reliability: Handles proxies, rate limits, and anti-scraping measures for consistent extraction (Crawler).
Efficiency: Manages requests to minimize bandwidth and avoid detection (Crawler).
Precision: Targeted scraping with tag and scope control (Scraper).
Flexibility: Mobile emulation and TLS options for diverse sites (Scraper).
Generate structured, LLM-compatible data; customize inclusion/exclusion; handle static and dynamic content.

Prerequisites

Before using Firecrawl, ensure the following:

A valid Firecrawl API Key (opens in a new tab).
Access to the Firecrawl service host URL.
Properly configured credentials for Firecrawl.
A webhook endpoint for receiving notifications (required for the crawler).

⚠️

For Self Hosting,If the connection fails, whitelist the following IPs: https://www.cloudflare.com/ips/ (opens in a new tab)

Setup

Step 1: Obtain API Credentials

Register on Firecrawl (opens in a new tab).
Generate an API key from your account dashboard.
Note the Host URL and Webhook Endpoint.

Step 2: Configure Firecrawl Credentials

Use the following format to set up your credentials:

Key Name	Description	Example Value
Credential Name	Name to identify this set of credentials	`my-firecrawl-creds`
Firecrawl API Key	Authentication key for accessing Firecrawl services	`fc_api_xxxxxxxxxxxxx`
Host	Base URL where Firecrawl service is hosted	`https://api.firecrawl.dev`

Configuration Reference

Sync Mode Output Format

Batched Mode

{
  "success": true,
  "status": "completed",
  "completed": 48,
  "total": 50,
  "creditsUsed": 13,
  "expiresAt": "2025-08-01T12:30:00.000Z",
  "data": [
    {
      "url": "https://example.com/page-1",
      "content": "Lorem ipsum dolor sit amet...",
      "metadata": {
        "title": "Page 1 Title",
        "description": "This is a sample description.",
        "language": "en"
      }
    },
    {
      "url": "https://example.com/page-2",
      "content": "Second page scraped content...",
      "metadata": {
        "title": "Page 2 Title",
        "description": "Another sample description.",
        "language": "en"
      }
    }
    // ... more pages
  ]
}

Single Mode

{
  "success": true,
  "status": "completed",
  "completed": 1,
  "total": 2,
  "creditsUsed": 1,
  "expiresAt": "2025-08-02T12:30:00.000Z",
  "data": [
    {
      "url": "https://example.com/page-1",
      "content": "Lorem ipsum dolor sit amet...",
      "metadata": {
        "title": "Page 1 Title",
        "description": "This is a sample description.",
        "language": "en"
      }
    }
  ]
}

Async Mode Output Format

{
  "success": true,
  "id": "8***************************7",
  "url": "https://api.firecrawl.dev/v1/crawl/8***************************7"
}

Step 3: Set Up the Mode (Crawler)

For the Crawler, choose sync or async mode. In async mode, configure a webhook endpoint to receive crawl results (create a Webhook flow/URL in Lamatic to receive crawl updates and results). Async options include:

Parameter	Description	Example Value
Callback Webhook	URL to receive notifications about crawl completion	`https://example.com/webhook`
Webhook Headers	Headers sent to the webhook	`{'Content-Type':application/json'}`
Webhook Metadata	Metadata sent to the webhook	`{'status':'{{codeNode_540.status}}'}`
Webhook Events	Events to send: `completed`, `failed`, `page`, `started`	`["completed", "failed", "page", "started"]`

Crawler Configuration (Single)

Parameter	Description	Example Value
Credential Name	Select previously saved credentials	`my-firecrawl-creds`
URL	Starting point URL for the crawler	`https://example.com`
Exclude Path	URL patterns to exclude from the crawl	`"admin/", "private/"`
Include Path	URL patterns to include in the crawl	`"blog/", "products/"`
Crawl Depth	Maximum depth to crawl relative to the entered URL	`3`
Crawl Limit	Maximum number of pages to crawl	`1000`
Crawl Sub Pages	Toggle to enable or disable crawling sub pages	`true`
Max Discovery Depth	Max depth for discovering new URLs during the crawl	`5`
Ignore Sitemap	Ignore the sitemap.xml file for crawling	`false`
Allow Backward Links	Allow crawling backward links (e.g., blog post → homepage)	`true`
Allow External Links	Allow crawling external links (e.g., links to other domains)	`false`
Ignore Query Parameters	Ignore specific query parameters in URLs	`false`
Delay	Delay between requests to avoid overloading server (in seconds)	`2`

Batch Crawler Configuration (Async / Sync)

Parameter	Description	Example Value
Credential Name	Select previously saved credentials	`my-firecrawl-creds`
URL List	List of starting URLs to crawl	`[ "https://x.com", "https://y.com" ]`
Include Path	Paths to include during crawl	`"blog/*"`
Exclude Path	Paths to exclude during crawl	`"admin/*"`
Crawl Depth	Depth to crawl for each URL	`3`
Crawl Limit	Max pages per domain	`500`
Max Discovery Depth	How far discovered links can go	`4`
Allow External Links	Whether to crawl external domains	`false`
Allow Backward Links	Whether to revisit previous pages	`true`
Crawl Sub Pages	Enable sub-page traversal	`true`
Ignore Sitemap	Skip sitemap.xml	`false`
Delay	Throttle request delay in seconds	`1`
Callback Webhook	URL to receive notifications about crawl completion	`https://example.com/webhook`
Webhook Headers	Headers to be sent to the webhook	`{'Content-Type:application/json'}`
Webhook Metadata	Metadata to be sent to the webhook	`{'status':'{{codeNode_540.status}}'}`
Webhook Events	A multiselect list of events to be sent to the webhook	`["completed", "failed", "page", "started"]`

Scraper Configuration (Single)

Parameter	Description	Example Value
Credential Name	Select previously saved credentials	`my-firecrawl-creds`
URL	Target URL to scrape	`https://example.com/page`
Main Content	Extract only main content (exclude header/footer/nav)	`true`
Skip TLS Verification	Bypass SSL certificate validation	`false`
Include Tags	HTML tags to include in extraction	`p, h1, h2, article`
Exclude Tags	HTML tags to exclude from extraction	`nav, footer, aside`
Emulate Mobile Device	Simulate mobile browser access	`true`
Wait for Page Load	Time to wait for dynamic content (in ms)	`123`

Batch Scraper Configuration (Async)

Parameter	Description	Example Value
Credential Name	Select previously saved credentials	`my-firecrawl-creds`
URL List	List of URLs to scrape in batch	`[ "https://a.com", "https://b.com" ]`
Main Content	Extract only main content from each page	`true`
Skip TLS Verification	Ignore SSL certificate errors	`false`
Include Tags	HTML tags to extract	`p, h1, h2`
Exclude Tags	HTML tags to exclude from extraction	`aside, footer`
Emulate Mobile Device	Use mobile browser viewport	`true`
Wait for Page Load	Delay for dynamic content to appear (in ms)	`200`
Callback Webhook	URL to receive notifications about crawl completion	`https://example.com/webhook`
Webhook Headers	Headers to be sent to the webhook	`{'Content-Type:application/json'}`
Webhook Metadata	Metadata to be sent to the webhook	`{'status':'{{codeNode_540.status}}'}`
Webhook Events	A multiselect list of events to be sent to the webhook	`["completed", "failed", "page", "started"]`

Low-Code Examples

Crawler Node

nodes:
  - nodeId: crawlerNode_880
    nodeType: crawlerNode
    nodeName: Crawler
    values:
      credentials: my-firecrawl-creds
      url: https://example.com
      crawlMode: async
      webhook: https://example.com/webhook
      webhookEvents: [completed, failed, page, started]
      crawlSubPages: true
      crawlLimit: 10
      crawlDepth: 1
      excludePath: []
      includePath: []
      maxDiscoveryDepth: 1
      ignoreSitemap: false
      allowBackwardLinks: false
      allowExternalLinks: false
      delay: 0
    needs:
      - triggerNode_1

Scraper Node

nodes:
  - nodeId: scraperNode_680
    nodeType: scraperNode
    nodeName: Scraper
    values:
      credentials: my-firecrawl-creds
      url: https://example.com/page
      onlyMainContent: false
      skipTLsVerification: false
      mobile: false
      waitFor: 123
      includeTags: []
      excludeTags: []
    needs:
      - triggerNode_1

Crawler Event Output (Async Mode)

When using async crawler mode, the node can emit these events to your webhook:

Started: When the crawl begins (job ID, URL).
Page: For each crawled page (URL, extracted data).
Completed: When the crawl finishes (summary, page count, errors).
Failed: When the crawl fails (error message, job ID).

Example – Started:

{
  "success": true,
  "type": "crawl.started",
  "id": "82***********************4",
  "data": [],
  "metadata": {}
}

Scraper Output Schema

markdown: Scraped content as Markdown.
language: Detected language of the content.
referrer: Referrer URL, if applicable.
title: Title of the scraped page.
scrapeId: Unique identifier for the scrape.
sourceURL: URL from which content was scraped.
url: Resolved URL of the resource.
statusCode: HTTP status code of the scrape.

Example:

{
  "markdown": "# Page content...",
  "language": "en",
  "referrer": "",
  "title": "Example Page",
  "scrapeId": "uuid-here",
  "sourceURL": "https://example.com/page",
  "url": "https://example.com/page",
  "statusCode": 200
}

Map URL Configuration

Parameter	Description	Example Value
Credential Name	Select previously saved credentials	`my-firecrawl-creds`
URL	Starting URL to map the structure	`https://example.com`
Main Content	Extract only main content from each page	`true`
Skip TLS Verification	Ignore SSL certificate errors	`false`
Include Tags	HTML tags to extract	`p, h1, h2`
Exclude Tags	HTML tags to exclude from extraction	`aside, footer`
Emulate Mobile Device	Use mobile browser viewport	`true`
Wait for Page Load	Delay for dynamic content to appear (in ms)	`200`

Map URL Output Example

{
  "success": true,
  "links": [
    "https://lamatic.ai/docs",
    "https://lamatic.ai/docs/architecture",
    "https://lamatic.ai/docs/career",
    "https://lamatic.ai/docs/context",
    "https://lamatic.ai/docs/context/vectordb",
    "https://lamatic.ai/docs/context/vectordb/adding-data",
    "https://lamatic.ai/docs/contributing",
    "https://lamatic.ai/docs/demo",
    "https://lamatic.ai/docs/deployments",
    "https://lamatic.ai/docs/deployments/cache"
  ]
}

Agent Workflow Configuration

Firecrawl now supports agent workflows with three execution modes for submitting and monitoring agent tasks.

Agent Modes

Async Agent: Submit agent tasks asynchronously and receive a task ID for later status checking.
Sync Agent: Execute agent tasks synchronously and receive results immediately upon completion.
Check Agent: Check the status of a previously submitted async agent task.

Agent Configuration (Async / Sync)

Parameter	Description	Example Value
Credential Name	Select previously saved credentials	`my-firecrawl-creds`
Prompt	The prompt or instruction for the agent to execute	`"Extract all product prices from the page"`
URLs	Optional list of URLs to limit the agent’s scope. You can provide this as an array of URLs or as a comma-separated list.	`[ "https://example.com/products" ]`
Schema	JSON schema defining the expected output structure	`{"price": "string", "name": "string"}`
Max Credit	Maximum credits to use for the agent task	`100`
Strict URL Constraints	Enforce strict URL matching rules	`true`

Check Agent Configuration

Parameter	Description	Example Value
Credential Name	Select previously saved credentials	`my-firecrawl-creds`
Agent Job ID	The task ID returned from an Async Agent submission	`"8***************************7"`

Troubleshooting

Common Issues

Problem	Solution
Invalid API Key	Ensure the API key is correct and has not expired.
Connection Issues	Verify that the host URL is correct and reachable.
Webhook Errors	Check if the webhook endpoint is active and correctly configured.
Crawling Errors	Review the inclusion/exclusion paths for accuracy.
Dynamic Content Not Loaded	Increase the `Wait for Page Load` time in the configuration.

Debugging

Check Firecrawl logs for detailed error information.
Test the webhook endpoint to confirm it is receiving updates.
If the connection fails, whitelist the following IPs: https://www.cloudflare.com/ips/ (opens in a new tab)