• Tools
Tools
  • Tools
loading...
No Results
  • AirDroid Business
    • Index
    • Clear app data and cache
    • Create a group
    • Delete groups
    • Disable lost mode
    • Error codes
    • Enable lost mode
    • Field reference
    • Get average screen time
    • Get account activities
    • Get a group
    • Get a group id by group name
    • Get a device by name
    • Get a device app by name
    • Get an activity log
    • Get all devices
    • Get all device apps
    • Get all devices with filter
    • Get device info push
    • Get device location report
    • Get device network connection history
    • Get device application usage duration
    • Get device application report
    • Get device online status report
    • Get device remote access report
    • Get data usage overview and trends
    • Get tag ids by tag names
    • Get top 10 apps by usage duration
    • Get top 10 data usage apps
    • Lock a device
    • Move devices to a group
    • Open app to foreground
    • Power off a device
    • Reboot device
    • Remote operation
    • Set tags
    • Turn off device screen
    • Unenroll a device
    • Update a device name
    • Update a device remark
    • Update a group name
    • Update a group remark
  • ActiveCampaign
  • Asana
  • AWS-S3
  • AWS Lambda
  • Appstore
  • BambooHR
  • Bitbucket
  • Brevo
  • Coda
  • Code
  • ConvertKit
  • CSV
  • Crypto
  • Clockify
  • Data Shaping
  • Date & Time
  • Delay
  • DingTalk
  • Discourse
  • Discord
  • Dropbox
  • Elastic Security
  • FeiShu
  • Firecrawl
  • Freshdesk
  • Freshservice
  • Freshworks CRM
  • Gerrit
  • Gitlab
  • Github
  • Grafana
  • Google Ads
  • Google Docs
  • Google Drive
  • Google Gmail
  • Google Sheets
  • Google Analytics
  • Google Calendar
  • Google Developer
  • Harvest
  • HaloPSA
  • Hacker News
  • Hubspot
  • Help Scout
  • Intercom
  • Jira
  • Jenkins
  • Kafka
  • Lemlist
  • MySQL
  • Monday
  • Metabase
  • MailChimp
  • Microsoft To Do
  • Microsoft Excel
  • Microsoft OneDrive
  • Microsoft Outlook
  • Notion
  • Nextcloud
  • Odoo
  • Ortto
  • Okta
  • PayPal
  • Paddle
  • Pipedrive
  • PostHog
  • PostgreSQL
  • Qdrant
  • QRCode
  • QuickBooks
  • Redis
  • Strapi
  • Stripe
  • Splunk
  • Shopify
  • Segment
  • ServiceNow
  • Search&Crawl
  • Text
  • Trello
  • Twilio
  • Todoist
  • Webflow
  • Wikipedia
  • WordPress
  • WooCommerce
  • Xml
  • YouTube
  • Zulip
  • Zoom
  • Zendesk
  • Zammad
  • Zoho CRM
Home > Tools

Firecrawl

1. Overview

Firecrawl is an advanced API service designed to turn entire websites into clean, LLM-ready markdown or structured data. It handles complex web scraping tasks, including JavaScript rendering, bypassing anti-bot protections, and managing concurrent crawling operations.

Through the Firecrawl node in GoInsight, you can seamlessly integrate powerful web data extraction capabilities into your automated workflows. You can achieve comprehensive web data gathering and processing, including:

  • Scraping and Crawling: Extract content from single URLs, batch process multiple URLs, or crawl entire websites with customizable depth and path filtering.
  • AI-Powered Extraction: Utilize AI agents to autonomously navigate websites and extract structured data based on natural language prompts or JSON schemas.
  • Browser Session Management: Create, manage, and execute code within isolated browser sandbox sessions for advanced web automation.
  • Account Management: Monitor your team's credit usage, token consumption, and active job queues in real-time.

2. Prerequisites

Before using this node, you need to meet the following general conditions:

  • You must have a valid Firecrawl account and an active API Key.
  • Sufficient credits or tokens in your Firecrawl account to perform the desired scraping, crawling, or extraction operations.

3. Credentials

For detailed instructions on how to obtain and configure credentials, please refer to our official documentation: Credentials.

4. Supported Operations

Summary

This node primarily operates on resources such as Browser Session, Agent, Batch Scrape, Crawl, Extract, Team Usage, Search, Map, and Scrape.

Resource Operation Description
Browser Session Create Browser Session Creates a new isolated browser sandbox session for web automation. Each session runs in a secure environment with Playwright pre-installed and Chrome DevTools Protocol access. Supports persistent profiles to save browser state across sessions. Sessions auto-close after inactivity or when TTL expires. Maximum 20 concurrent sessions per account.
Browser Session Delete Browser Session WARNING: This operation permanently deletes the browser session and cannot be undone. Closes and deletes a browser sandbox session. If the session was configured with a persistent profile and saveChanges was enabled, profile changes will be saved. Use this to properly clean up sessions when done to free up resources. Sessions also auto-close when TTL expires or after inactivity timeout.
Browser Session List Browser Sessions Lists all browser sandbox sessions for your account. Can filter by status to show only active sessions. Returns session metadata including ID, status, creation time, and other details. Useful for monitoring active sessions and managing session lifecycle. ⚠️ WARNING: This action returns ALL sessions in one request (Firecrawl API does not support pagination). For accounts with many historical sessions, the response may be large. Maximum 20 concurrent active sessions (running at the same time) allowed per account.
Browser Session Execute Browser Code Executes code within an active browser sandbox session. Supports Python, Node.js (JavaScript), and Bash scripts. Playwright is pre-installed for browser automation tasks like navigation, clicking, form filling, and screenshot capture. The code runs in a secure isolated environment with access to the browser instance.
Agent Agent AI-powered web data extraction agent that autonomously searches, navigates and gathers data from websites. No URLs required - just describe what data you need via natural language prompt. The agent finds and extracts data from hard-to-reach places across the web. Supports optional URL constraints, JSON schema for structured output, and model selection (spark-1-mini for cost efficiency or spark-1-pro for accuracy). Waits for job completion before returning results.
Agent Agent Async Starts an AI-powered web data extraction agent task asynchronously and returns immediately with a job ID. The agent autonomously searches, navigates and gathers data from websites based on natural language prompts. Use Get Agent Status to poll for task completion and retrieve results. Ideal for long-running extraction tasks or when you need to process multiple agent jobs in parallel.
Agent Get Agent Status Retrieves the current status and results of a Firecrawl Agent job. Use this to poll for completion of async agent tasks started with Agent Async. Returns the job status (processing/completed/failed), extracted data when completed, credits used, and expiration time. Poll at reasonable intervals (e.g., every 5-10 seconds) until status is completed or failed.
Agent Cancel Agent WARNING: Cancels a running Firecrawl Agent job. Once cancelled, the job cannot be resumed or restarted. Use this to stop an agent task that is still processing. Useful for stopping long-running extraction tasks or when results are no longer needed.
Batch Scrape Get Batch Scrape Status Retrieves the current status of a Firecrawl batch scrape job using its batch ID. Returns progress information including total URLs, completed count, failed count, and scraped data for completed items. Use this to monitor batch scraping operations and retrieve results as they complete.
Batch Scrape Get Batch Scrape Errors Retrieves error information for a Firecrawl batch scrape job using its batch ID. Returns an array of errors encountered during the batch scraping process, including URLs that failed and their corresponding error messages. Use this to diagnose issues with batch scrape operations.
Batch Scrape Cancel Batch Scrape Job Cancels an ongoing Firecrawl batch scrape job using its batch ID. Stops the scraping operation and returns the final status. Use this when you need to stop a batch scrape job before it completes naturally.
Crawl Crawl a Website Crawls a website starting from the specified URL and scrapes all discovered pages. Initiates a crawl job then polls for completion (up to 5 minutes), returning all scraped page data including content and metadata. WARNING: This operation consumes API credits — one credit per page crawled. If the crawl does not finish within the polling timeout, partial results are returned with Status=scraping. Supports configurable crawl depth (recommended 1–5), page limits, wildcard path filtering, and per-page scrape options.
Crawl Get Crawl Status Retrieves the current status of a Firecrawl crawl job using its job ID. Returns job status, progress information (total and completed pages), credits used, and any scraped data available. Use this to check on ongoing crawl operations initiated by the Crawl a Website action.
Crawl Get Crawl Errors Retrieves error information for a Firecrawl crawl job using its job ID. Returns an array of errors encountered during the crawling process, including URLs that failed and their corresponding error messages. Use this to diagnose issues with crawl operations.
Crawl List Active Crawls Retrieves a list of all currently active Firecrawl crawl jobs for the authenticated account. Returns an array of crawl job objects containing job IDs, statuses, URLs, and other metadata. Use this to monitor all ongoing crawl operations.
Crawl Cancel a Crawl Job WARNING: Cancels an ongoing Firecrawl crawl job using its job ID. Cancellation is permanent and cannot be undone. Any data already crawled before cancellation will be preserved. Use this when you need to terminate a long-running crawl job before it completes.
Crawl Preview Crawl Params from Prompt Uses AI to analyze a natural language prompt and generate suggested crawl parameters. Provide a description of what you want to crawl (e.g., crawl all blog posts from 2024 on example.com), and Firecrawl will return optimized crawl configuration including URL, paths, depth, and filters. This helps you quickly set up crawl jobs without manually configuring all parameters.
Extract Extract Web Data with AI Uses AI to extract structured data from web pages based on natural language prompts or JSON schemas. Supports both simple instructions (e.g., extract all product names and prices) and complex schema definitions. Automatically waits for extraction to complete with configurable polling. Perfect for automated data extraction, web scraping with specific requirements, and turning unstructured web content into structured data.
Extract Extract Web Data with AI Async Asynchronously starts an AI-powered web data extraction task and returns immediately with a job ID. Supports both natural language prompts and JSON schema definitions. Use Get Extract Status to poll for task completion and retrieve results. Ideal for long-running extraction tasks or when you need to initiate multiple extraction jobs in parallel.
Extract Get Extract Status Retrieves the current status and results of a Firecrawl Extract job. Use this to poll for completion of extract tasks that were started asynchronously. Returns the job status (processing/completed/failed) and extracted data when completed. Poll at reasonable intervals until status is completed or failed.
Team Usage Get Team Credit Usage Retrieves the current credit usage information for the authenticated Firecrawl team, including remaining credits, plan credits, and billing period details.
Team Usage Get Team Token Usage Retrieves the current token usage information for the authenticated Firecrawl team, including remaining tokens, plan tokens, and billing period details.
Team Usage Get Team Queue Status Retrieves the current queue status for the authenticated Firecrawl team, including jobs in queue, active jobs, waiting jobs, max concurrency, and most recent success timestamp.
Team Usage Get Historical Credit Usage Retrieves the historical credit usage information for the authenticated Firecrawl team on a month-by-month basis. Optionally, the data can be broken down by API key.
Team Usage Get Historical Token Usage Retrieves the historical token usage information for the authenticated Firecrawl team on a month-by-month basis. Optionally, the data can be broken down by API key.
Search Search the Web and Scrape Results Searches the web using a query and automatically scrapes content from the search results. Returns an array of scraped pages with titles, URLs, content, and metadata. Supports language and country localization, result limiting, and custom scraping options. Perfect for research and data gathering from web search results.
Map Map a Website Quickly maps a website to discover all accessible URLs without scraping content. This is faster than crawling and useful for understanding site structure. Supports filtering by search terms, including subdomains, and respecting or ignoring sitemaps. Returns a complete list of discovered URLs.
Scrape Scrape a URL Scrapes content from exactly one URL and returns a flat response (Markdown, Html, Metadata directly accessible — no array unwrapping needed). Use this tool when you have a single URL to scrape. For scraping 2 or more URLs at once, use Scrape URLs instead, which calls a batch API and returns a Results[] array with per-URL Success/Error status. Supports markdown, HTML, raw HTML, links extraction, and screenshots, with optional HTML tag filtering and configurable wait times for JavaScript-heavy pages.
Scrape Scrape URLs Scrapes content from 2 or more URLs in a single batch API call and returns a Results[] array, where each item has its own Url, Success, Markdown, and Error fields — allowing partial failures without losing successful results. For a single URL, use Scrape a URL instead, which calls a simpler API and returns a flat response (Markdown, Html, Metadata directly). Accepts multiple URLs separated by commas, semicolons, or newlines. Supports markdown, HTML, raw HTML, links extraction, and screenshots.

Operation Details

Create Browser Session

Creates a new isolated browser sandbox session for web automation. Each session runs in a secure environment with Playwright pre-installed and Chrome DevTools Protocol access. Supports persistent profiles to save browser state across sessions. Sessions auto-close after inactivity or when TTL expires. Maximum 20 concurrent sessions per account.

Options:

  • Ttl: Session lifetime in seconds (30-3600). Default is 600 (10 minutes)
  • ActivityTtl: Auto-close after inactivity in seconds (10-3600). Default is 300 (5 minutes)
  • ProfileName: Optional persistent profile name to save browser state across sessions. Naming rules: alphanumeric characters, hyphens, and underscores only, no spaces or special characters, maximum 64 characters, case-sensitive. Example: my-automation-profile
  • SaveChanges: Whether to save profile changes when session ends. Default is false. Only takes effect when ProfileName is provided. If ProfileName is empty, this parameter will be ignored and no state will be saved.

Output:

  • SessionId (string): The unique identifier of the browser session
  • CdpUrl (string): Chrome DevTools Protocol WebSocket URL for connecting to the browser. Use with automation tools like Playwright or Puppeteer. Example (Playwright): browser = await playwright.chromium.connect_over_cdp(cdp_url). If unfamiliar with browser automation, use LiveViewUrl to view the session instead.
  • LiveViewUrl (string): URL for embedding a live view stream of the browser session
  • InteractiveLiveViewUrl (string): URL for an interactive live view stream that allows user interaction
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Delete Browser Session

WARNING: This operation permanently deletes the browser session and cannot be undone. Closes and deletes a browser sandbox session. If the session was configured with a persistent profile and saveChanges was enabled, profile changes will be saved. Use this to properly clean up sessions when done to free up resources. Sessions also auto-close when TTL expires or after inactivity timeout.

Input Parameters:

  • SessionId: The browser session ID to delete/close. Obtained from Create Browser Session action response. Format: sess_xxxxxxxxx (e.g., sess_abc123xyz).

Output:

  • SessionId (string): The ID of the deleted browser session
  • Success (boolean): Whether the session was successfully deleted
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

List Browser Sessions

Lists all browser sandbox sessions for your account. Can filter by status to show only active sessions. Returns session metadata including ID, status, creation time, and other details. Useful for monitoring active sessions and managing session lifecycle. ⚠️ WARNING: This action returns ALL sessions in one request (Firecrawl API does not support pagination). For accounts with many historical sessions, the response may be large. Maximum 20 concurrent active sessions (running at the same time) allowed per account.

Options:

  • Status: Optional status filter to list sessions by their current state. Supported values: active (only running sessions), inactive (only terminated sessions), or leave empty to return all sessions. Example: active

Output:

  • Sessions (object-array): Array of browser session objects. Each session contains: id (string) - unique session identifier (e.g., sess_abc123xyz), status (string) - session status (active/inactive), createdAt (string) - creation time in ISO 8601 format, expiresAt (string) - expiration time in ISO 8601 format, url (string) - the URL currently being browsed, metadata (object) - browser configuration including userAgent and viewport settings.
  • TotalCount (number): Total number of sessions returned
  • IsComplete (boolean): Indicates whether all sessions are returned. Always true since Firecrawl API does not support pagination.
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Execute Browser Code

Executes code within an active browser sandbox session. Supports Python, Node.js (JavaScript), and Bash scripts. Playwright is pre-installed for browser automation tasks like navigation, clicking, form filling, and screenshot capture. The code runs in a secure isolated environment with access to the browser instance.

Input Parameters:

  • SessionId: The browser session ID to execute code in. Obtain this by first calling Create Browser Session action, which returns a SessionId. Session IDs start with 'sess_'. Example: sess_abc123xyz.
  • Code: The code to execute in the browser session. Multi-line code uses

as line separator. The 'page' object is pre-initialized. Common operations: page.goto('URL'), page.click('selector'), page.fill('selector','value'), page.wait_for_selector('selector'), page.inner_text('selector'), page.screenshot(path='file.png'). Example: page.goto('https://example.com')

print(page.title())

Options:

  • Language: Code language: python, node, or bash. Default is python

Output:

  • Success (boolean): Whether the code executed successfully
  • Result (string): The output result from code execution
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Agent

AI-powered web data extraction agent that autonomously searches, navigates and gathers data from websites. No URLs required - just describe what data you need via natural language prompt. The agent finds and extracts data from hard-to-reach places across the web. Supports optional URL constraints, JSON schema for structured output, and model selection (spark-1-mini for cost efficiency or spark-1-pro for accuracy). Waits for job completion before returning results.

Options:

  • Prompt: Natural language description of data to extract. Max 10000 characters. Example: Find all product prices and specifications from this e-commerce site
  • Urls: Optional comma-separated list of URLs to constrain agent search scope. Example: https://example.com,https://example.com/products
  • Schema: JSON schema defining the structure of extracted data. Pass as a JSON object.
  • Model: Model to use: spark-1-mini (default, 60% cheaper) or spark-1-pro (higher accuracy)
  • MaxCredits: Maximum credits to spend on this agent task. Default is 2500
  • StrictConstrainToUrls: If true, strictly restrict agent to provided URLs only. Default is false
  • PollInterval: Seconds between status checks when waiting for completion. Default is 5
  • MaxPollTime: Maximum seconds to wait for agent job completion. Default is 600 (10 minutes)

Output:

  • JobId (string): The unique identifier of the agent job
  • Status (string): The final status of the agent job (completed/failed/timeout)
  • Data (object): The extracted data object returned by the agent. Structure is determined by the input Schema or Prompt. When Schema is provided, keys match the schema properties. When only Prompt is used, keys are inferred by the AI.
  • Model (string): The model used for extraction (spark-1-mini or spark-1-pro)
  • CreditsUsed (number): Number of credits consumed by this agent task
  • ExpiresAt (string): ISO 8601 timestamp when the extracted data expires (24 hours from completion)
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Agent Async

Starts an AI-powered web data extraction agent task asynchronously and returns immediately with a job ID. The agent autonomously searches, navigates and gathers data from websites based on natural language prompts. Use Get Agent Status to poll for task completion and retrieve results. Ideal for long-running extraction tasks or when you need to process multiple agent jobs in parallel.

Options:

  • Prompt: Natural language description of data to extract. Max 10000 characters. Example: Find all product prices and specifications from this e-commerce site
  • Urls: Optional list of URLs to constrain agent search scope. Provide as an array of strings. Leave empty to allow the agent to search freely. Example: ["https://example.com", "https://example.com/products"]
  • Schema: JSON schema defining the structure of extracted data. Pass as a JSON object. Use this to enforce a specific output format. Common fields: type (string/number/boolean/object/array), properties (object field definitions), required (required field list). Example: {"type":"object","properties":{"products":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}},"required":["name","price"]}}}}. If omitted, the agent returns data in a flexible format based on the prompt.
  • Model: Model to use for data extraction. spark-1-mini (default): 60% cheaper, suitable for simple data extraction (product lists, prices, contact info). spark-1-pro: Higher accuracy, recommended for complex structured data or when mini model fails to extract correctly. Start with mini, upgrade to pro if results are unsatisfactory.
  • MaxCredits: Maximum credits to spend on this agent task. Default is 2500. Approximately 1 credit = 1 web page request. For example, 2500 credits can process ~2500 pages. Actual usage depends on website complexity and agent navigation depth.
  • StrictConstrainToUrls: If true, strictly restrict agent to provided URLs only. Default is false

Output:

  • JobId (string): The unique identifier of the agent job. Use this with Get Agent Status to poll for results
  • Success (boolean): Whether the agent task was successfully started. true: Task started, use JobId to poll for results. false: Task failed to start, check ErrorMessage for details.
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Get Agent Status

Retrieves the current status and results of a Firecrawl Agent job. Use this to poll for completion of async agent tasks started with Agent Async. Returns the job status (processing/completed/failed), extracted data when completed, credits used, and expiration time. Poll at reasonable intervals (e.g., every 5-10 seconds) until status is completed or failed.

Input Parameters:

  • JobId: The unique identifier (UUID format) of the agent job to check status for. This JobId is returned by the Agent Async action when you start an asynchronous agent task. Example: agnt_abc123xyz.

Output:

  • JobId (string): The unique identifier of the agent job
  • Status (string): The current status of the agent job: processing, completed, or failed
  • Data (object): The extracted data object returned by the agent. Structure is determined by the input Schema or Prompt. When Schema is provided, keys match the schema properties. When only Prompt is used, keys are inferred by the AI.
  • Model (string): The model used for extraction (spark-1-mini or spark-1-pro)
  • CreditsUsed (number): Number of credits consumed by this agent task
  • ExpiresAt (string): ISO 8601 timestamp when the extracted data expires (24 hours from completion)
  • Error (string): Error message from the agent task. Only present when status is failed
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Cancel Agent

WARNING: Cancels a running Firecrawl Agent job. Once cancelled, the job cannot be resumed or restarted. Use this to stop an agent task that is still processing. Useful for stopping long-running extraction tasks or when results are no longer needed.

Input Parameters:

  • JobId: The unique identifier of the Firecrawl Agent job to cancel. Obtained from the Agent Async action response. Format: string with 'agnt_' prefix (e.g., agnt_550e8400-e29b-41d4-a716-446655440000).

Output:

  • JobId (string): The unique identifier of the cancelled agent job
  • Success (boolean): Whether the agent job was successfully cancelled
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Get Batch Scrape Status

Retrieves the current status of a Firecrawl batch scrape job using its batch ID. Returns progress information including total URLs, completed count, failed count, and scraped data for completed items. Use this to monitor batch scraping operations and retrieve results as they complete.

Options:

  • BatchId: The unique identifier of the batch scrape job to get status for

Output:

  • BatchId (string): The unique identifier of the batch scrape job
  • Status (string): Current status of the batch scrape job (e.g., processing, completed, failed)
  • Total (number): Total number of URLs in the batch scrape job
  • Completed (number): Number of URLs successfully scraped
  • Failed (number): Number of URLs that failed to scrape
  • Data (object-array): Array of scraped results for completed URLs, each containing URL, content, and metadata
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Get Batch Scrape Errors

Retrieves error information for a Firecrawl batch scrape job using its batch ID. Returns an array of errors encountered during the batch scraping process, including URLs that failed and their corresponding error messages. Use this to diagnose issues with batch scrape operations.

Options:

  • BatchId: The unique identifier of the batch scrape job to retrieve errors for. Obtained from the Start Batch Scrape action response. Format: string starting with 'batch_' followed by alphanumeric characters. Example: batch_abc123xyz.

Output:

  • BatchId (string): The unique identifier of the batch scrape job
  • Errors (object-array): Array of error objects encountered during the batch scrape. Each error object contains: url (string) - the URL that failed to scrape, error (string) - detailed error message. Example: [{"url":"https://example.com/page","error":"404 Not Found"}]
  • TotalErrors (number): Total number of errors that occurred during the batch scrape operation
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Cancel Batch Scrape Job

Cancels an ongoing Firecrawl batch scrape job using its batch ID. Stops the scraping operation and returns the final status. Use this when you need to stop a batch scrape job before it completes naturally.

Options:

  • BatchId: The unique identifier of the batch scrape job to cancel

Output:

  • BatchId (string): The unique identifier of the cancelled batch scrape job
  • Status (string): Final status of the batch scrape job after cancellation (typically cancelled)
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Crawl a Website

Crawls a website starting from the specified URL and scrapes all discovered pages. Initiates a crawl job then polls for completion (up to 5 minutes), returning all scraped page data including content and metadata. WARNING: This operation consumes API credits — one credit per page crawled. If the crawl does not finish within the polling timeout, partial results are returned with Status=scraping. Supports configurable crawl depth (recommended 1–5), page limits, wildcard path filtering, and per-page scrape options.

Options:

  • Url: The starting URL to crawl and scrape all pages from. Must include protocol (http:// or https://). Example: https://docs.example.com
  • MaxDepth: Maximum crawl depth — number of link hops from the starting URL. Recommended range: 1–5. Default is 2. Higher values discover more pages but consume more credits.
  • Limit: Maximum number of pages to crawl. Recommended range: 1–100. Default is 10. Each page consumes one API credit.
  • IncludePaths: Comma-separated list of URL path patterns to include. Supports wildcard (). Case-sensitive. Only pages whose paths match at least one pattern will be crawled. Example: /blog/,/docs/* will only crawl pages under /blog/ or /docs/
  • ExcludePaths: Comma-separated list of URL path patterns to exclude. Supports wildcard (). Case-sensitive. Pages whose paths match any pattern will be skipped. Example: /admin/,/login/* will skip all admin and login pages
  • IgnoreSitemap: Whether to ignore the website's sitemap.xml file (a machine-readable map of all pages). If false, the crawler uses the sitemap for faster and more complete page discovery. Default is false.
  • AllowBackwardLinks: Whether to crawl links that navigate up in the URL path hierarchy (e.g., from /blog/post to /blog or /). If false, only forward/deeper links are followed. Default is false.
  • AllowExternalLinks: Whether to follow and crawl links that point to external domains (different from the starting URL's domain). If false, only pages within the same domain are crawled. Default is false.
  • ScrapeOptions: Additional per-page scrape configuration as a JSON object. Supported fields: formats (array) — output formats, e.g. ["markdown","html"]; onlyMainContent (boolean) — extract only main content, default true; includeTags (array) — HTML tags to include, e.g. ["article","main"]; excludeTags (array) — HTML tags to exclude, e.g. ["nav","footer"]; waitFor (number) — milliseconds to wait before scraping. Example: {"formats":["markdown"],"onlyMainContent":true,"waitFor":1000}

Output:

  • JobId (string): The unique identifier for the crawl job
  • Status (string): Final status of the crawl job: completed (all pages scraped successfully), failed (job encountered a fatal error), scraping (still in progress — returned when polling timeout occurs), queued (job is waiting to start).
  • TotalPages (number): Total number of pages discovered during crawling
  • CompletedPages (number): Number of pages successfully scraped
  • CreditsUsed (number): Number of credits consumed by this crawl operation
  • Data (object-array): Array of scraped page objects. Each object contains: url (string) — the page URL; markdown (string) — page content in Markdown; html (string) — cleaned HTML (if requested); rawHtml (string) — raw HTML; links (array) — links found on the page; screenshot (string) — base64 screenshot; metadata (object) — page metadata including title, description, statusCode.
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Get Crawl Status

Retrieves the current status of a Firecrawl crawl job using its job ID. Returns job status, progress information (total and completed pages), credits used, and any scraped data available. Use this to check on ongoing crawl operations initiated by the Crawl a Website action.

Input Parameters:

  • JobId: The unique identifier of the crawl job to check status for. Obtained from the Crawl a Website action response. Example: crawl_abc123xyz.

Output:

  • JobId (string): The unique identifier of the crawl job
  • Status (string): Current status of the crawl job (scraping, completed, failed)
  • TotalPages (number): Total number of pages discovered during crawling
  • CompletedPages (number): Number of pages successfully scraped so far
  • CreditsUsed (number): Number of credits consumed by this crawl operation
  • ExpiresAt (string): Timestamp when the crawl job results will expire (ISO 8601 format)
  • NextUrl (string): URL to fetch the next batch of results if pagination is available
  • Data (object-array): Array of scraped page data available so far, each containing URL, content, and metadata
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Get Crawl Errors

Retrieves error information for a Firecrawl crawl job using its job ID. Returns an array of errors encountered during the crawling process, including URLs that failed and their corresponding error messages. Use this to diagnose issues with crawl operations.

Input Parameters:

  • JobId: The unique identifier of the crawl job to get errors for. Obtained from the Start Crawl action response. Example: crawl_abc123xyz.

Output:

  • JobId (string): The unique identifier of the crawl job
  • Errors (object-array): Array of error objects encountered during the crawl. Each error object contains: url (string) - the URL that failed to crawl, error (string) - error message describing why the crawl failed. Example: [{"url":"https://example.com/page","error":"404 Not Found"}]
  • TotalErrors (number): Total number of errors that occurred during the crawl operation
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

List Active Crawls

Retrieves a list of all currently active Firecrawl crawl jobs for the authenticated account. Returns an array of crawl job objects containing job IDs, statuses, URLs, and other metadata. Use this to monitor all ongoing crawl operations.

Output:

  • ActiveCrawls (object-array): Array of active crawl job objects. Each object contains: id (string) - job ID, status (string) - current status, total (number) - total pages discovered, completed (number) - pages crawled so far, creditsUsed (number) - credits consumed, expiresAt (string) - ISO 8601 expiry timestamp.
  • TotalCount (number): Total number of active crawl jobs
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Cancel a Crawl Job

WARNING: Cancels an ongoing Firecrawl crawl job using its job ID. Cancellation is permanent and cannot be undone. Any data already crawled before cancellation will be preserved. Use this when you need to terminate a long-running crawl job before it completes.

Input Parameters:

  • JobId: The unique identifier of the crawl job to cancel. Obtained from the Start a Crawl Job action response. Example format: crawl_abc123xyz.

Output:

  • JobId (string): The unique identifier of the cancelled crawl job
  • Status (string): Status of the job after cancellation. Possible values: cancelled (successfully cancelled), failed (cancellation failed), completed (job already finished before cancellation).
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Preview Crawl Params from Prompt

Uses AI to analyze a natural language prompt and generate suggested crawl parameters. Provide a description of what you want to crawl (e.g., crawl all blog posts from 2024 on example.com), and Firecrawl will return optimized crawl configuration including URL, paths, depth, and filters. This helps you quickly set up crawl jobs without manually configuring all parameters.

Options:

  • Prompt: Natural language description of what you want to crawl. Example: Crawl the blog section of example.com and get all posts from 2024

Output:

  • SuggestedParams (object): AI-generated crawl configuration object. May include: url (string) - starting URL, limit (number) - max pages, maxDepth (number) - crawl depth, includePaths (array) - path filters, excludePaths (array) - excluded paths, scrapeOptions (object) - per-page scrape config.
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Extract Web Data with AI

Uses AI to extract structured data from web pages based on natural language prompts or JSON schemas. Supports both simple instructions (e.g., extract all product names and prices) and complex schema definitions. Automatically waits for extraction to complete with configurable polling. Perfect for automated data extraction, web scraping with specific requirements, and turning unstructured web content into structured data.

Input Parameters:

  • Url: The URL of the webpage to extract data from

Options:

  • Prompt: Natural language instruction for what data to extract from the webpage. Be specific about field names and data types. Either Prompt or Schema must be provided (both can be used together). Examples: "Extract all product names and prices" | "Get contact info: email, phone, address" | "Extract job postings with title, company, location, and salary"
  • Schema: JSON schema defining the structure of extracted data. Pass as a JSON object. Use this to enforce strict types and output format. Common fields: type (string/number/boolean/object/array), properties (object field definitions), required (required field list). Example: {"type":"object","properties":{"products":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"},"inStock":{"type":"boolean"}},"required":["name","price"]}}}}. Either Prompt or Schema must be provided (both can be used together for maximum precision).
  • EnableJavascript: Whether to enable JavaScript rendering for dynamic content. Default is true. Disable only for static HTML pages with no dynamic loading — disabling can improve extraction speed.
  • Timeout: Maximum time in milliseconds (1 second = 1000 ms) to wait for page load before giving up. Default is 30000 (30 seconds). Increase for slow-loading pages (e.g., 60000 for 60 seconds); decrease for fast static pages.
  • PollInterval: Seconds between status checks while waiting for extraction to complete. Extraction runs asynchronously — the system polls periodically until done. Default is 2 seconds. Increase to 5 for complex pages to reduce API call frequency.
  • MaxPollTime: Maximum seconds to wait before giving up if extraction takes too long. When timeout occurs, extraction is cancelled and an error is returned. Default is 300 seconds (5 minutes). Increase to 600 for complex pages; reduce to 120 for simple extraction.

Output:

  • Url (string): The URL from which data was extracted
  • ExtractedData (object): The extracted data object. Structure is determined by the input Schema or Prompt. When Schema is provided, keys match the schema properties exactly (e.g., {"products": [{"name": "Widget", "price": 29.99}]}). When only Prompt is used, keys are inferred by the AI based on the prompt (e.g., products, contacts, articles, jobs). For consistent output structure, using Schema is strongly recommended.
  • Success (boolean): Whether the extraction completed successfully
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Extract Web Data with AI Async

Asynchronously starts an AI-powered web data extraction task and returns immediately with a job ID. Supports both natural language prompts and JSON schema definitions. Use Get Extract Status to poll for task completion and retrieve results. Ideal for long-running extraction tasks or when you need to initiate multiple extraction jobs in parallel.

Options:

  • Url: The URL of the webpage to extract data from
  • Prompt: Natural language instruction for what data to extract from the webpage. Be specific about field names and data types. Either Prompt or Schema must be provided (both can be used together). Examples: "Extract all product names and prices" | "Get contact info: email, phone, address" | "Extract job postings with title, company, location, and salary"
  • Schema: JSON schema defining the structure of extracted data. Pass as a JSON object. Use this to enforce a specific output format. Common fields: type (string/number/boolean/object/array), properties (object field definitions), required (required field list). Example: {"type":"object","properties":{"products":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}},"required":["name","price"]}}}}. Either Prompt or Schema must be provided (both can be used together for maximum precision).

Output:

  • JobId (string): The unique identifier of the extract job. Use this with Get Extract Status to poll for results
  • Success (boolean): Whether the extract task was successfully started
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Get Extract Status

Retrieves the current status and results of a Firecrawl Extract job. Use this to poll for completion of extract tasks that were started asynchronously. Returns the job status (processing/completed/failed) and extracted data when completed. Poll at reasonable intervals until status is completed or failed.

Input Parameters:

  • JobId: The Extract job ID to check status for. Obtained from the Extract Web Data with AI Async action response. Example: extract_abc123xyz.

Output:

  • JobId (string): The unique identifier of the extract job
  • Status (string): The current status of the extract job: processing, completed, or failed
  • Data (object): The extracted data object. Structure is determined by the input Schema or Prompt. When Schema is provided, keys match the schema properties. When only Prompt is used, keys are inferred by the AI.
  • ExpiresAt (string): ISO 8601 timestamp when the extracted data expires
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Get Team Credit Usage

Retrieves the current credit usage information for the authenticated Firecrawl team, including remaining credits, plan credits, and billing period details.

Output:

  • RemainingCredits (number): Number of credits remaining for the team.
  • PlanCredits (number): Number of credits in the plan (excluding coupon credits, credit packs, or auto recharge credits).
  • BillingPeriodStart (string): Billing period start date in ISO 8601 format. Empty string for free plans.
  • BillingPeriodEnd (string): Billing period end date in ISO 8601 format. Empty string for free plans.
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Get Team Token Usage

Retrieves the current token usage information for the authenticated Firecrawl team, including remaining tokens, plan tokens, and billing period details.

Output:

  • RemainingTokens (number): Number of tokens remaining for the team in the current billing period. Tokens are Firecrawl's billing units — each API operation (scraping, crawling, etc.) consumes a certain number of tokens. This value includes both plan tokens and any coupon tokens.
  • PlanTokens (number): Number of tokens in the plan (excluding coupon tokens).
  • BillingPeriodStart (string): Billing period start date in ISO 8601 format (e.g., 2026-03-01T00:00:00Z). Returns empty string for free plans because they do not have billing cycles. Paid plans have monthly billing periods.
  • BillingPeriodEnd (string): Billing period end date in ISO 8601 format (e.g., 2026-03-31T23:59:59Z). Returns empty string for free plans because they do not have billing cycles. Paid plans have monthly billing periods.
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Get Team Queue Status

Retrieves the current queue status for the authenticated Firecrawl team, including jobs in queue, active jobs, waiting jobs, max concurrency, and most recent success timestamp.

Output:

  • JobsInQueue (number): Total number of jobs currently in the queue (= ActiveJobsInQueue + WaitingJobsInQueue).
  • ActiveJobsInQueue (number): Number of jobs currently being processed.
  • WaitingJobsInQueue (number): Number of jobs waiting to be processed.
  • MaxConcurrency (number): Maximum number of concurrent active jobs allowed based on your plan. If ActiveJobsInQueue reaches MaxConcurrency, new jobs will be queued and wait until a slot becomes available.
  • MostRecentSuccess (string): Timestamp of the most recent successful job in ISO 8601 format. Empty if no record.
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success, check ErrorMessage for business errors), -1 (parameter validation error), 500 (network timeout or connection failure).
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Get Historical Credit Usage

Retrieves the historical credit usage information for the authenticated Firecrawl team on a month-by-month basis. Optionally, the data can be broken down by API key.

Options:

  • ByApiKey: Whether to break down historical credit usage by API key

Output:

  • Periods (object-array): List of billing period records with StartDate, EndDate, ApiKey, and TotalCredits fields.
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Get Historical Token Usage

Retrieves the historical token usage information for the authenticated Firecrawl team on a month-by-month basis. Optionally, the data can be broken down by API key.

Options:

  • ByApiKey: Whether to break down historical token usage by API key

Output:

  • Periods (object-array): List of billing period records with StartDate, EndDate, ApiKey, and TotalTokens fields.
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Search the Web and Scrape Results

Searches the web using a query and automatically scrapes content from the search results. Returns an array of scraped pages with titles, URLs, content, and metadata. Supports language and country localization, result limiting, and custom scraping options. Perfect for research and data gathering from web search results.

Input Parameters:

  • Query: The search query to look for on the web. Required — calls will fail if empty. Example: "latest AI research papers 2024"

Options:

  • Limit: Maximum number of search results to scrape. Default is 5
  • Lang: Language code (ISO 639-1) for search results. Leave empty for default. Common values: en (English), es (Spanish), fr (French), de (German), zh (Chinese), ja (Japanese), ko (Korean). Example: "en"
  • Country: Country code (ISO 3166-1 alpha-2) for localized search results. Leave empty for default. Common values: us (United States), uk (United Kingdom), ca (Canada), au (Australia), de (Germany), fr (France), jp (Japan). Example: "us"
  • ScrapeOptions: Additional per-page scrape configuration for each search result. Pass as a JSON object. Supported fields: formats (array) — output formats, e.g. ["markdown","html"]; onlyMainContent (boolean) — extract only main content, default true; includeTags (array) — HTML tags to include, e.g. ["article","main"]; excludeTags (array) — HTML tags to exclude, e.g. ["nav","footer"]; waitFor (number) — milliseconds to wait before scraping. Example: {"formats":["markdown"],"onlyMainContent":true,"waitFor":1000}

Output:

  • Results (object-array): Array of scraped search result objects. Each object contains: url (string) — the page URL; title (string) — page title; markdown (string) — page content in Markdown format; html (string, optional) — cleaned HTML; rawHtml (string, optional) — raw HTML; metadata (object) — includes title, description, language, statusCode, and ogImage (if available).
  • TotalResults (number): Total number of search results scraped
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Map a Website

Quickly maps a website to discover all accessible URLs without scraping content. This is faster than crawling and useful for understanding site structure. Supports filtering by search terms, including subdomains, and respecting or ignoring sitemaps. Returns a complete list of discovered URLs.

Options:

  • Url: The website URL to map and discover all accessible URLs. Must include protocol (http:// or https://). Supports domain-level (https://example.com) and path-level (https://example.com/docs) URLs. Example: https://docs.firecrawl.dev
  • Search: Optional search query to filter discovered URLs. Uses case-insensitive substring matching against the full URL path. Only URLs containing this term will be returned. Example: api will match https://example.com/api/reference
  • IgnoreSitemap: Whether to ignore the website's sitemap. Default is false
  • IncludeSubdomains: Whether to include URLs from subdomains. Default is false
  • Limit: Maximum number of URLs to discover. Default is 5000

Output:

  • Urls (string-array): Array of all discovered URLs from the website. URLs are returned as absolute paths (e.g., https://example.com/page). Automatically deduplicated. Order is not guaranteed. May be empty if no URLs match the search criteria.
  • TotalUrls (number): Total number of URLs discovered
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Scrape a URL

Scrapes content from exactly one URL and returns a flat response (Markdown, Html, Metadata directly accessible — no array unwrapping needed). Use this tool when you have a single URL to scrape. For scraping 2 or more URLs at once, use Scrape URLs instead, which calls a batch API and returns a Results[] array with per-URL Success/Error status. Supports markdown, HTML, raw HTML, links extraction, and screenshots, with optional HTML tag filtering and configurable wait times for JavaScript-heavy pages.

Options:

  • Url: The URL of the webpage to scrape
  • Formats: Comma-separated list of output formats. Options: markdown, html, rawHtml, links, screenshot. Default is markdown
  • OnlyMainContent: Whether to extract only the main content, removing navigation and footers. Default is true
  • IncludeTags: Comma-separated list of HTML tags to include. Example: article,main,div
  • ExcludeTags: Comma-separated list of HTML tags to exclude. Example: nav,footer,aside
  • WaitFor: Milliseconds to wait for page load before scraping. Useful for JavaScript-heavy sites. Default is 0
  • Timeout: Maximum time in milliseconds to wait for the page to load. Default is 30000 (30 seconds)

Output:

  • Url (string): The URL that was scraped
  • Markdown (string): Scraped content in Markdown format (populated when markdown format is requested)
  • Html (string): Scraped content in cleaned HTML format (populated when html format is requested)
  • RawHtml (string): Raw HTML content as returned by the page (populated when rawHtml format is requested)
  • Links (string-array): Array of links found on the page (populated when links format is requested)
  • Screenshot (string): Base64-encoded screenshot of the page (populated when screenshot format is requested)
  • Metadata (object): Page metadata including title, description, status code, and other meta tags
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

Scrape URLs

Scrapes content from 2 or more URLs in a single batch API call and returns a Results[] array, where each item has its own Url, Success, Markdown, and Error fields — allowing partial failures without losing successful results. For a single URL, use Scrape a URL instead, which calls a simpler API and returns a flat response (Markdown, Html, Metadata directly). Accepts multiple URLs separated by commas, semicolons, or newlines. Supports markdown, HTML, raw HTML, links extraction, and screenshots.

Options:

  • Urls: Single URL or multiple URLs to scrape. For multiple URLs, separate by commas, semicolons, or newlines. Examples: Single: https://example.com | Multiple: https://example.com, https://example.org
  • Formats: Comma-separated list of output formats. Options: markdown (formatted text), html (cleaned HTML with boilerplate removed), rawHtml (original unmodified HTML source), links (list of all hyperlinks), screenshot (base64-encoded page image). Default is markdown
  • OnlyMainContent: Whether to extract only the main content, removing navigation and footers. Default is true
  • IncludeTags: Comma-separated list of HTML tags to include. Example: article,main,div
  • ExcludeTags: Comma-separated list of HTML tags to exclude. Example: nav,footer,aside
  • WaitFor: Milliseconds to wait for page load before scraping. Useful for JavaScript-heavy sites. Default is 0
  • Timeout: Maximum time in milliseconds to wait for the page to load. Default is 30000 (30 seconds)

Output:

  • Results (object-array): Array of scraped result objects. Each object contains: Url (string) - the scraped URL, Success (boolean) - whether scraping succeeded, Markdown (string) - content in Markdown format, Html (string) - content in HTML format, RawHtml (string) - raw HTML source, Links (string-array) - extracted hyperlinks found on the page, Screenshot (string) - base64-encoded screenshot if requested, Metadata (object) - page metadata including title, description, statusCode, language, Error (string) - error message if scraping failed
  • TotalUrls (number): Total number of URLs submitted for scraping
  • SuccessCount (number): Number of URLs successfully scraped
  • FailureCount (number): Number of URLs that failed to scrape
  • OriginalStatusCode (number): HTTP status code returned by the upstream API. 0 if the API was not reached.
  • StatusCode (number): Operation status code: 200 (success), -1 (parameter validation error), or other HTTP status codes for errors.
  • ErrorMessage (string): Error description if the operation failed, empty string if successful.

5. Example Usage

This section will guide you through creating a simple workflow to scrape the content of a single webpage and convert it into clean Markdown text.

Workflow Overview: Start -> Firecrawl -> Answer

Step-by-Step Guide:

  1. Add the Tool Node:
    • On the workflow canvas, click the "+" icon to add a new node.
    • Select the "Tools" tab in the popup panel.
    • Find and select Firecrawl from the list of available tools.
    • In the list of supported operations for Firecrawl, click on Scrape a URL. This will add the corresponding node to your canvas.
  2. Configure the Node:
    • Click on the newly added Scrape a URL node to open its configuration panel on the right.
    • Credentials: At the top of the panel, locate the credentials field. Click the dropdown menu and select your pre-configured Firecrawl API credentials.
    • Parameter Configuration: Fill in the required details to specify what you want to scrape.
    • Url: Enter the full URL of the webpage you want to scrape (e.g., https://example.com/blog/article-1).
    • Formats: Leave it as markdown (the default) to get clean, LLM-ready text.
    • OnlyMainContent: Leave it as true to automatically strip out headers, footers, and navigation menus.
  3. Run and Verify:
    • Once the parameters are configured, any error indicators on the node will disappear.
    • Click the "Test Run" button in the top right corner of the canvas to execute the workflow.
    • After a successful run, click the log icon to view the detailed output. You will see the Markdown field populated with the extracted text from the webpage.

After completing these steps, your workflow is fully configured. Upon execution, Firecrawl will visit the specified URL, extract the main content, and return it as clean Markdown ready for further processing by an LLM or other nodes.

6. FAQs

Q: Why is the scraped content missing data that I can see in my browser?

A: This usually happens with JavaScript-heavy websites (like React or Vue apps) where content loads dynamically after the initial page request. Try the following:

  • Increase WaitFor: Set the WaitFor parameter to 5000 (5 seconds) or higher to give the page time to render before Firecrawl extracts the content.

Q: What is the difference between "Scrape a URL" and "Scrape URLs"?

A: Choose based on your needs:

  • Scrape a URL: Best for single pages. It returns a flat, easy-to-use response object where you can directly access the Markdown or Html fields.
  • Scrape URLs: Best for batch processing multiple links at once. It returns an array of results, allowing some URLs to fail while others succeed without breaking the entire operation.

Q: I'm getting a 401 Unauthorized or 403 Forbidden error. What should I do?

A: Please check the following points:

  • API Key Validity: Ensure your Firecrawl API key is correct and has not expired.
  • Account Balance: Check if your team has sufficient credits or tokens remaining to perform the operation. You can use the Get Team Credit Usage action to verify this.

7. Official Documentation

Firecrawl Official API Documentation

Updated on: Mar 27, 2026
Was This Page Helpful?
Prev FeiShu
Next Freshdesk
Discussion

Leave a Reply. Cancel reply

Your email address will not be published. Required fields are marked*

Product-related questions?Contact Our Support Team to Get a Quick Solution>
On this page
  • 1. Overview
  • 2. Prerequisites
  • 3. Credentials
  • 4. Supported Operations
    • Summary
    • Operation Details
  • 5. Example Usage
  • 6. FAQs
  • 7. Official Documentation
loading...
No Results