# Check Balance Source: https://docs.browser-use.com/api-reference/api-v1/check-balance https://api.browser-use.com/api/v1/openapi.json get /balance Returns the user's current API credit balance, which includes both monthly subscription credits and any additional purchased credits. Required for monitoring usage and ensuring sufficient credits for task execution. # Create Browser Profile Source: https://docs.browser-use.com/api-reference/api-v1/create-browser-profile https://api.browser-use.com/api/v1/openapi.json post /browser-profiles Create a new browser profile with custom settings for ad blocking, proxy usage, and viewport dimensions. Pay as you go users can only have one profile. Subscription users can create multiple profiles. # Create Scheduled Task Source: https://docs.browser-use.com/api-reference/api-v1/create-scheduled-task https://api.browser-use.com/api/v1/openapi.json post /scheduled-task Create a scheduled task to run at regular intervals or based on a cron expression. Requires an active subscription. Returns the scheduled task ID. # Delete Browser Profile Source: https://docs.browser-use.com/api-reference/api-v1/delete-browser-profile https://api.browser-use.com/api/v1/openapi.json delete /browser-profiles/{profile_id} Deletes a browser profile. This will remove the profile and all associated browser data. # Delete Scheduled Task Source: https://docs.browser-use.com/api-reference/api-v1/delete-scheduled-task https://api.browser-use.com/api/v1/openapi.json delete /scheduled-task/{task_id} Deletes a scheduled task. This will prevent any future runs of this task. Any currently running instances of this task will be allowed to complete. # Get Browser Profile Source: https://docs.browser-use.com/api-reference/api-v1/get-browser-profile https://api.browser-use.com/api/v1/openapi.json get /browser-profiles/{profile_id} Returns information about a specific browser profile and its configuration settings. # Get Browser Use Version Source: https://docs.browser-use.com/api-reference/api-v1/get-browser-use-version https://api.browser-use.com/api/v1/openapi.json get /browser-use-version Returns the browser-use Python library version used by the backend. # Get Scheduled Task Source: https://docs.browser-use.com/api-reference/api-v1/get-scheduled-task https://api.browser-use.com/api/v1/openapi.json get /scheduled-task/{task_id} Returns detailed information about a specific scheduled task, including its schedule configuration and current status. # Get Task Source: https://docs.browser-use.com/api-reference/api-v1/get-task https://api.browser-use.com/api/v1/openapi.json get /task/{task_id} Returns comprehensive information about a task, including its current status, steps completed, output (if finished), and other metadata. # Get Task Gif Source: https://docs.browser-use.com/api-reference/api-v1/get-task-gif https://api.browser-use.com/api/v1/openapi.json get /task/{task_id}/gif Returns a gif url generated from the screenshots of the task execution. Only available for completed tasks that have screenshots. # Get Task Media Source: https://docs.browser-use.com/api-reference/api-v1/get-task-media https://api.browser-use.com/api/v1/openapi.json get /task/{task_id}/media Returns links to any recordings or media generated during task execution, such as browser session recordings. Only available for completed tasks. # Get Task Output File Source: https://docs.browser-use.com/api-reference/api-v1/get-task-output-file https://api.browser-use.com/api/v1/openapi.json get /task/{task_id}/output-file/{file_name} Returns a presigned url for downloading a file from the task output files. # Get Task Screenshots Source: https://docs.browser-use.com/api-reference/api-v1/get-task-screenshots https://api.browser-use.com/api/v1/openapi.json get /task/{task_id}/screenshots Returns any screenshot urls generated during task execution. # Get Task Status Source: https://docs.browser-use.com/api-reference/api-v1/get-task-status https://api.browser-use.com/api/v1/openapi.json get /task/{task_id}/status Returns just the current status of a task (created, running, finished, stopped, or paused). More lightweight than the full task details endpoint. # List Browser Profiles Source: https://docs.browser-use.com/api-reference/api-v1/list-browser-profiles https://api.browser-use.com/api/v1/openapi.json get /browser-profiles Returns a paginated list of all browser profiles belonging to the user, ordered by creation date. Each profile includes configuration like ad blocker settings, proxy settings, and viewport dimensions. # List Scheduled Tasks Source: https://docs.browser-use.com/api-reference/api-v1/list-scheduled-tasks https://api.browser-use.com/api/v1/openapi.json get /scheduled-tasks Returns a paginated list of all scheduled tasks belonging to the user, ordered by creation date. Each task includes basic information like schedule type, next run time, and status. # List Tasks Source: https://docs.browser-use.com/api-reference/api-v1/list-tasks https://api.browser-use.com/api/v1/openapi.json get /tasks Returns a paginated list of all tasks belonging to the user, ordered by creation date. Each task includes basic information like status and creation time. For detailed task info, use the get task endpoint. # Me Source: https://docs.browser-use.com/api-reference/api-v1/me https://api.browser-use.com/api/v1/openapi.json get /me Returns a boolean value indicating if the API key is valid and the user is authenticated. # Pause Task Source: https://docs.browser-use.com/api-reference/api-v1/pause-task https://api.browser-use.com/api/v1/openapi.json put /pause-task Pauses execution of a running task. The task can be resumed later using the `/resume-task` endpoint. Useful for manual intervention or inspection. # Ping Source: https://docs.browser-use.com/api-reference/api-v1/ping https://api.browser-use.com/api/v1/openapi.json get /ping Use this endpoint to check if the server is running and responding. # Resume Task Source: https://docs.browser-use.com/api-reference/api-v1/resume-task https://api.browser-use.com/api/v1/openapi.json put /resume-task Resumes execution of a previously paused task. The task will continue from where it was paused. You can't resume a stopped task. # Run Task Source: https://docs.browser-use.com/api-reference/api-v1/run-task https://api.browser-use.com/api/v1/openapi.json post /run-task Requires an active subscription. Returns the task ID that can be used to track progress. # Search Url Source: https://docs.browser-use.com/api-reference/api-v1/search-url https://api.browser-use.com/api/v1/openapi.json post /search-url Search a single URL using browser use. # Simple Search Source: https://docs.browser-use.com/api-reference/api-v1/simple-search https://api.browser-use.com/api/v1/openapi.json post /simple-search Search the internet using browser use. # Stop Task Source: https://docs.browser-use.com/api-reference/api-v1/stop-task https://api.browser-use.com/api/v1/openapi.json put /stop-task Stops a running browser automation task immediately. The task cannot be resumed after being stopped. Use `/pause-task` endpoint instead if you want to temporarily halt execution. # Update Browser Profile Source: https://docs.browser-use.com/api-reference/api-v1/update-browser-profile https://api.browser-use.com/api/v1/openapi.json put /browser-profiles/{profile_id} Update a browser profile with partial updates. Only the fields you want to change need to be included. # Update Scheduled Task Source: https://docs.browser-use.com/api-reference/api-v1/update-scheduled-task https://api.browser-use.com/api/v1/openapi.json put /scheduled-task/{task_id} Update a scheduled task with partial updates. # Upload File Presigned Url Source: https://docs.browser-use.com/api-reference/api-v1/upload-file-presigned-url https://api.browser-use.com/api/v1/openapi.json post /uploads/presigned-url Returns a presigned url for uploading a file to the user's files bucket. After uploading a file, the user can use the `included_file_names` field in the `RunTaskRequest` to include the files in the task. # Authentication Source: https://docs.browser-use.com/cloud/v1/authentication Learn how to authenticate with the Browser Use Cloud API The Browser Use Cloud API uses API keys to authenticate requests. You can obtain an API key from your [Browser Use Cloud dashboard](https://cloud.browser-use.com/settings/api-keys). ## API Keys All API requests must include your API key in the `Authorization` header: ```bash Authorization: Bearer YOUR_API_KEY ``` Keep your API keys secure and do not share them in publicly accessible areas such as GitHub, client-side code, or in your browser's developer tools. API keys should be stored securely in environment variables or a secure key management system. ## Example Request Here's an example of how to include your API key in a request using Python: ```python import requests API_KEY = 'your_api_key_here' BASE_URL = 'https://api.browser-use.com/api/v1' HEADERS = {'Authorization': f'Bearer {API_KEY}'} response = requests.get(f'{BASE_URL}/me', headers=HEADERS) print(response.json()) ``` ## Verifying Authentication You can verify that your API key is valid by making a request to the `/api/v1/me` endpoint. See the [Me endpoint documentation](/api-reference/api-v1/me) for more details. ## API Key Security To ensure the security of your API keys: 1. **Never share your API key** in publicly accessible areas 2. **Rotate your API keys** periodically 3. **Use environment variables** to store API keys in your applications 4. **Implement proper access controls** for your API keys 5. **Monitor API key usage** for suspicious activity If you believe your API key has been compromised, you should immediately revoke it and generate a new one from your Browser Use Cloud dashboard. # Cloud SDK Source: https://docs.browser-use.com/cloud/v1/custom-sdk Learn how to set up your own Browser Use Cloud SDK This guide walks you through setting up your own Browser Use Cloud SDK. ## Building your own client (OpenAPI) This approach is recommended **only** if you need to run simple tasks and **don’t require fine-grained control**. The best way to build your own client is to use our [OpenAPI specification](http://api.browser-use.com/openapi.json) to generate a type-safe client library. ### Python Use [openapi-python-client](https://github.com/openapi-generators/openapi-python-client) to generate a modern Python client: ```bash # Install the generator pipx install openapi-python-client --include-deps # Generate the client openapi-python-client generate --url http://api.browser-use.com/openapi.json ``` This will create a Python package with full type hints, modern dataclasses, and async support. ### TypeScript/JavaScript Use [OpenAPI TS](https://openapi-ts.dev/) library to generate a type safe TypeScript client for the Browser Use API. The following guide shows how to create a simple type-safe `fetch` client, but you can also use other generators. * React Query - [https://openapi-ts.dev/openapi-react-query/](https://openapi-ts.dev/openapi-react-query/) * SWR - [https://openapi-ts.dev/swr-openapi/](https://openapi-ts.dev/swr-openapi/) ```bash npm npm install openapi-fetch npm install -D openapi-typescript typescript ``` ```bash yarn yarn add openapi-fetch yarn add -D openapi-typescript typescript ``` ```bash pnpm pnpm add openapi-fetch pnpm add -D openapi-typescript typescript ``` ```json title="package.json" { "scripts": { "openapi:gen": "openapi-typescript https://api.browser-use.com/openapi.json -o ./src/lib/api/v1.d.ts" } } ``` ```bash pnpm openapi:gen ``` ```ts // client.ts 'use client' import createClient from 'openapi-fetch' import { paths } from '@/lib/api/v1' export type Client = ReturnType> export const client = createClient({ baseUrl: 'https://api.browser-use.com/', // NOTE: You can get your API key from https://cloud.browser-use.com/billing! headers: { Authorization: `Bearer ${apiKey}` }, }) ``` Need help? Contact our support team at [support@browser-use.com](mailto:support@browser-use.com) or join our [Discord community](https://link.browser-use.com/discord) # V1 Implementation Source: https://docs.browser-use.com/cloud/v1/implementation Learn how to implement the Browser Use API in Python This guide shows how to implement common API patterns using Python. We'll create a complete example that creates and monitors a browser automation task. ## Basic Implementation For all settings see [Run Task](/api-reference/api-v1/run-task). Here's a simple implementation using Python's `requests` library to stream the task steps: ```python import json import time import requests API_KEY = 'your_api_key_here' BASE_URL = 'https://api.browser-use.com/api/v1' HEADERS = {'Authorization': f'Bearer {API_KEY}'} def create_task(instructions: str): """Create a new browser automation task""" response = requests.post(f'{BASE_URL}/run-task', headers=HEADERS, json={'task': instructions}) return response.json()['id'] def get_task_status(task_id: str): """Get current task status""" response = requests.get(f'{BASE_URL}/task/{task_id}/status', headers=HEADERS) return response.json() def get_task_details(task_id: str): """Get full task details including output""" response = requests.get(f'{BASE_URL}/task/{task_id}', headers=HEADERS) return response.json() def wait_for_completion(task_id: str, poll_interval: int = 2): """Poll task status until completion""" count = 0 unique_steps = [] while True: details = get_task_details(task_id) new_steps = details['steps'] # use only the new steps that are not in unique_steps. if new_steps != unique_steps: for step in new_steps: if step not in unique_steps: print(json.dumps(step, indent=4)) unique_steps = new_steps count += 1 status = details['status'] if status in ['finished', 'failed', 'stopped']: return details time.sleep(poll_interval) def main(): task_id = create_task('Open https://www.google.com and search for openai') print(f'Task created with ID: {task_id}') task_details = wait_for_completion(task_id) print(f"Final output: {task_details['output']}") if __name__ == '__main__': main() ``` ## Task Control Example Here's how to implement task control with pause/resume functionality: ```python def control_task(): # Create a new task task_id = create_task("Go to google.com and search for Browser Use") # Wait for 5 seconds time.sleep(5) # Pause the task requests.put(f"{BASE_URL}/pause-task?task_id={task_id}", headers=HEADERS) print("Task paused! Check the live preview.") # Wait for user input input("Press Enter to resume...") # Resume the task requests.put(f"{BASE_URL}/resume-task?task_id={task_id}", headers=HEADERS) # Wait for completion result = wait_for_completion(task_id) print(f"Task completed with output: {result['output']}") ``` ## Structured Output Example Here's how to implement a task with structured JSON output: ```python import json import os import time import requests from pydantic import BaseModel from typing import List API_KEY = os.getenv("API_KEY") BASE_URL = 'https://api.browser-use.com/api/v1' HEADERS = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" } # Define output schema using Pydantic class SocialMediaCompany(BaseModel): name: str market_cap: float headquarters: str founded_year: int class SocialMediaCompanies(BaseModel): companies: List[SocialMediaCompany] def create_structured_task(instructions: str, schema: dict): """Create a task that expects structured output""" payload = { "task": instructions, "structured_output_json": json.dumps(schema) } response = requests.post(f"{BASE_URL}/run-task", headers=HEADERS, json=payload) response.raise_for_status() return response.json()["id"] def wait_for_task_completion(task_id: str, poll_interval: int = 5): """Poll task status until it completes""" while True: response = requests.get(f"{BASE_URL}/task/{task_id}/status", headers=HEADERS) response.raise_for_status() status = response.json() if status == "finished": break elif status in ["failed", "stopped"]: raise RuntimeError(f"Task {task_id} ended with status: {status}") print("Waiting for task to finish...") time.sleep(poll_interval) def fetch_task_output(task_id: str): """Retrieve the final task result""" response = requests.get(f"{BASE_URL}/task/{task_id}", headers=HEADERS) response.raise_for_status() return response.json()["output"] def main(): schema = SocialMediaCompanies.model_json_schema() task_id = create_structured_task( "Get me the top social media companies by market cap", schema ) print(f"Task created with ID: {task_id}") wait_for_task_completion(task_id) print("Task completed!") output = fetch_task_output(task_id) print("Raw output:", output) try: parsed = SocialMediaCompanies.model_validate_json(output) print("Parsed output:") print(parsed) except Exception as e: print(f"Failed to parse structured output: {e}") if __name__ == "__main__": main() ``` Remember to handle your API key securely and implement proper error handling in production code. # N8N + Browser Use Cloud Source: https://docs.browser-use.com/cloud/v1/n8n-browser-use-integration Learn how to integrate Browser Use Cloud API with n8n using a practical workflow example (competitor research). > **TL;DR** – In **3 minutes** you can have an n8n workflow that: > > 1. Shows a form asking for a competitor’s name > 2. Starts a Browser Use task that crawls the web and extracts **pricing, jobs, new features & announcements** > 3. Waits for the task to finish via a **webhook** > 4. Formats the output and drops a rich message into Slack You can grab the workflow JSON below – copy it and import it into n8n, plug in your API keys and hit *Execute* 🚀. *** ## Why use Browser Use in n8n? • **Autonomous browsing** – Browser Use opens pages like a real user, follows links, clicks buttons and reads DOM content. • **Structured output** – You tell the agent *exactly* which fields you need. No brittle regex or XPaths. • **Scales effortlessly** – Kick off hundreds of tasks and monitor them through the Cloud API. n8n glues everything together so your team gets the data instantly—no Python scripts or CRON jobs needed. *** ## Prerequisites 1. **Browser Use Cloud API key** – grab one from your [Billing page](https://cloud.browser-use.com/billing). 2. **n8n instance** – self-hosted or n8n.cloud. (The screenshots below use n8n 1.45+.) 3. **Slack Incoming Webhook URL** – create one in your Slack workspace. Add both secrets to n8n’s credential manager: ```env title=".env example" BROWSER_USE_API_KEY="sk-…" SLACK_WEBHOOK_URL="https://hooks.slack.com/services/…" ``` *** ## Import the template 1. Copy the [workflow JSON](#workflow-json) below to your clipboard. 2. In n8n create a new workflow and paste the JSON. 3. Replace the *Browser-Use API Key* credential and *Slack Incoming Webhook URL* with yours. *** ## How the workflow works ### 1. `Form Trigger` – collect the competitor’s name A public n8n form with a single required field. When a user submits, the workflow fires instantly. ### 2. `HTTP Request – Browser Use Run Task` We POST to `/api/v1/run-task` with the following body: ```json title="run-task payload" { "task": "Do exhaustive research on {{ $json[\"Competitor Name\"] }} and extract all pricing information, job postings, new features and announcements", "save_browser_data": true, "structured_output_json": { "pricing": { "plans": ["string"], "prices": ["string"], "features": ["string"] }, "jobs": { "titles": ["string"], "departments": ["string"], "locations": ["string"] }, "new_features": { "titles": ["string"], "description": ["string"] }, "announcements": { "titles": ["string"], "description": ["string"] } }, "metadata": { "source": "n8n-competitor-demo" } } ``` Important bits: • `structured_output_json` tells the agent which keys to return – no post-processing required. • We tag the task with `metadata.source` so the webhook can filter only *our* jobs. ### 3. `Webhook` + `IF` – wait for task completion Browser Use sends a webhook when anything happens to a task (see our [Webhooks guide](/cloud/v1/webhooks) for setup details). We expose an n8n Webhook node at `/get-research-data` and let the agent call it. We only proceed when **both** conditions are true: * `payload.status == "finished"` * `payload.metadata.source == "n8n-competitor-demo"` ### 4. `Get Task Details` The webhook body includes the `session_id`. We fetch the full task record so we get the `output` field containing the structured JSON from step 2. ### 5. `Code – Generate Slack message` A short JS snippet turns the JSON into a nicely-formatted Slack block with emojis and bullet points. Feel free to tweak the formatting. ### 6. `HTTP Request – Send to Slack` Finally we POST the message to your incoming webhook and celebrate 🎉. *** ## Customize as you want This workflow is just the starting point – Browser Use + n8n gives you endless possibilities. Here are some ideas: | Want to... | How to do it | | -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | | **Extract different data** | Edit `structured_output_json` to specify exactly what fields you need (pricing, reviews, contact info, etc.) and adjust the JS formatter. | | **Send to Teams/Email/Notion** | Swap the last Slack node for Teams, Gmail, or any of n8n's 400+ connectors. | | **Run automatically** | Replace the Form trigger with a Cron trigger for daily/weekly competitor monitoring. | | **Monitor multiple competitors** | Use a Google Sheets trigger with a list of companies and loop through them. | | **Add AI analysis** | Pipe the extracted data through OpenAI/Claude to generate insights and summaries. | | **Create alerts** | Set up conditional logic to only notify when competitors announce new features or price changes. | | **Build a dashboard** | Send data to Airtable, Notion, or Google Sheets to build a real-time competitor intelligence dashboard. | The beauty of Browser Use is that it handles the complex web browsing while you focus on building the perfect workflow for your needs. *** ## Workflow JSON ```json id="workflow-json" { "name": "Competitor Intelligence Workflow with webhooks", "nodes": [ { "parameters": { "httpMethod": "POST", "path": "get-research-data", "options": {} }, "type": "n8n-nodes-base.webhook", "typeVersion": 2, "position": [ -480, 176 ], "id": "81166dab-eb91-4627-b773-1aa7f7bd86ee", "name": "Webhook", "webhookId": "025bc4bf-00c0-47d4-bd5f-79046674d017" }, { "parameters": { "conditions": { "options": { "caseSensitive": true, "leftValue": "", "typeValidation": "strict", "version": 2 }, "conditions": [ { "id": "8d9701b6-1dc2-4e55-9fe4-ef1735ff1ebc", "leftValue": "={{ $json.body.payload.status }}", "rightValue": "finished", "operator": { "type": "string", "operation": "equals", "name": "filter.operator.equals" } }, { "id": "7cf18a23-f3d8-4a70-a77c-c286a231fc7f", "leftValue": "={{ $json.body.payload.metadata.source }}", "rightValue": "n8n-competitor-demo", "operator": { "type": "string", "operation": "equals", "name": "filter.operator.equals" } } ], "combinator": "and" }, "options": {} }, "type": "n8n-nodes-base.if", "typeVersion": 2.2, "position": [ -256, 176 ], "id": "b38737cc-0b8a-4a76-930f-362eb5de9ef9", "name": "If" }, { "parameters": { "formTitle": "Run Competitor Analysis", "formFields": { "values": [ { "fieldLabel": "Competitor Name", "placeholder": "(e.g. OpenAI)", "requiredField": true } ] }, "options": {} }, "type": "n8n-nodes-base.formTrigger", "typeVersion": 2.2, "position": [ -336, -64 ], "id": "fcfc33dd-7d8a-460b-838d-955c65416aea", "name": "On form submission", "webhookId": "b2712d5b-14ae-424b-8733-fe6e77cebd43" }, { "parameters": { "method": "POST", "url": "https://api.browser-use.com/api/v1/run-task", "authentication": "genericCredentialType", "genericAuthType": "httpBearerAuth", "sendHeaders": true, "headerParameters": { "parameters": [ {} ] }, "sendBody": true, "specifyBody": "json", "jsonBody": "={\n \"task\": \"Do exhaustive research on {{ $json['Competitor Name'] }} and extract all pricing information, job postings, new features and announcements\",\n \"save_browser_data\": true,\n \"structured_output_json\": \"{\\n \\\"pricing\\\": {\\n \\\"plans\\\": [\\\"string\\\"],\\n \\\"prices\\\": [\\\"string\\\"],\\n \\\"features\\\": [\\\"string\\\"]\\n },\\n \\\"jobs\\\": {\\n \\\"titles\\\": [\\\"string\\\"],\\n \\\"departments\\\": [\\\"string\\\"],\\n \\\"locations\\\": [\\\"string\\\"]\\n },\\n \\\"new_features\\\": {\\n \\\"titles\\\": [\\\"string\\\"],\\n \\\"description\\\": [\\\"string\\\"]\\n },\\n \\\"announcements\\\": {\\n \\\"titles\\\": [\\\"string\\\"],\\n \\\"description\\\": [\\\"string\\\"]\\n }\\n}\",\n\"metadata\": {\"source\": \"n8n-competitor-demo\"}\n} ", "options": {} }, "type": "n8n-nodes-base.httpRequest", "typeVersion": 4.2, "position": [ -112, -64 ], "id": "d10bef40-e2a3-41ff-a507-4f365c13dc52", "name": "BrowserUse Run Task", "credentials": { "httpBearerAuth": { "id": "peg6MzgmJNRMCMnT", "name": "Browser-Use API Key" } } }, { "parameters": { "url": "=https://api.browser-use.com/api/v1/task/{{ $('Webhook').item.json.body.payload.session_id }}", "authentication": "genericCredentialType", "genericAuthType": "httpBearerAuth", "options": {} }, "type": "n8n-nodes-base.httpRequest", "typeVersion": 4.2, "position": [ 0, 144 ], "id": "e49c28ff-11a2-4195-94ab-ca5796572c34", "name": "Get Task details", "credentials": { "httpBearerAuth": { "id": "peg6MzgmJNRMCMnT", "name": "Browser-Use API Key" } } }, { "parameters": { "jsCode": "const output_data = $input.first().json.output;\nconst data = JSON.parse(output_data);\n\nconst pricing = data?.pricing;\nconst jobs = data?.jobs;\nconst newFeatures = data?.new_features;\nconst announcements = data?.announcements;\n\n// Helper function to format arrays as bullet points\nconst formatAsBullets = (arr, prefix = \"• \" => {\n if (!arr || arr.length === 0) return \"• N/A\";\n return arr.map(item => `${prefix}${item}`).join(\"\\n\");\n};\n\nreturn {\n text: `🏷️ *Pricing*\\nPlans:\\n${formatAsBullets(pricing?.plans)}\\n\\nPrices:\\n${formatAsBullets(pricing?.prices)}\\n\\nFeatures:\\n${formatAsBullets(pricing?.features)}\\n\\n💼 *Jobs*\\nTitles:\\n${formatAsBullets(jobs?.titles)}\\n\\nDepartments:\\n${formatAsBullets(jobs?.departments)}\\n\\nLocations:\\n${formatAsBullets(jobs?.locations)}\\n\\n✨ *New Features*\\nTitles:\\n${formatAsBullets(newFeatures?.titles)}\\n\\nDescription:\\n${formatAsBullets(newFeatures?.description)}\\n\\n📢 *Announcements*\\n${formatAsBullets(announcements?.description)}`\n};" }, "type": "n8n-nodes-base.code", "typeVersion": 2, "position": [ 208, 144 ], "id": "54bc087d-237d-438a-b688-bcbec25d9c45", "name": "Generate Slack message" }, { "parameters": { "method": "POST", "url": "", "sendBody": true, "bodyParameters": { "parameters": [ { "name": "text", "value": "={{ $json.text }}" } ] }, "options": {} }, "type": "n8n-nodes-base.httpRequest", "typeVersion": 4.2, "position": [ 432, 144 ], "id": "969a16f0-677b-4e46-a8bb-57a80b5daf07", "name": "Send to Slack" } ], "pinData": {}, "connections": { "Webhook": { "main": [ [ { "node": "If", "type": "main", "index": 0 } ] ] }, "If": { "main": [ [ { "node": "Get Task details", "type": "main", "index": 0 } ] ] }, "On form submission": { "main": [ [ { "node": "BrowserUse Run Task", "type": "main", "index": 0 } ] ] }, "Get Task details": { "main": [ [ { "node": "Generate Slack message", "type": "main", "index": 0 } ] ] }, "Generate Slack message": { "main": [ [ { "node": "Send to Slack", "type": "main", "index": 0 } ] ] } }, "active": true, "settings": { "executionOrder": "v1" }, "versionId": "f3b38678-4821-41ad-952c-df9bbba40fc8", "meta": { "templateCredsSetupCompleted": true, "instanceId": "7a1d1fd830bae2a00010153cf810fd67e0c87b8ae64ceb62273c87183efda365" }, "id": "qmhqkZH8DhISWMmc", "tags": [] } ``` Copy everything between the braces, import into n8n and you're good to go. Having trouble? Ping us in the #integrations channel on [Discord](https://link.browser-use.com/discord) – we’re happy to help. # Pricing Source: https://docs.browser-use.com/cloud/v1/pricing Browser Use Cloud API pricing structure and cost breakdown The Browser Use Cloud API pricing consists of two components: 1. **Task Initialization Cost**: \$0.01 per started task 2. **Task Step Cost**: Additional cost based on the specific model used for each step ## LLM Model Step Pricing The following table shows the total cost per step for each available LLM model: | Model | Cost per Step | | -------------------------------- | ------------- | | GPT-4o | \$0.03 | | GPT-4o mini | \$0.01 | | GPT-4.1 | \$0.03 | | GPT-4.1 mini | \$0.01 | | O4 mini | \$0.02 | | O3 | \$0.03 | | Gemini 2.0 Flash | \$0.01 | | Gemini 2.0 Flash Lite | \$0.01 | | Gemini 2.5 Flash Preview (04/17) | \$0.01 | | Gemini 2.5 Flash | \$0.01 | | Gemini 2.5 Pro | \$0.03 | | Claude 3.7 Sonnet (2025-02-19) | \$0.03 | | Claude Sonnet 4 (2025-05-14) | \$0.03 | | Llama 4 Maverick 17B Instruct | \$0.01 | ## Example Cost Calculation For example, using GPT-4.1 for a 10 step task: * Task initialization: \$0.01 * 10 steps x \$0.03 per step = \$0.30 * **Total cost: \$0.31** # Quickstart Source: https://docs.browser-use.com/cloud/v1/quickstart Learn how to get started with the Browser Use Cloud API Browser Use Cloud Banner Browser Use Cloud Banner You need an active subscription and an API key from [cloud.browser-use.com/billing](https://cloud.browser-use.com/billing). For detailed pricing information, see our [pricing page](/cloud/v1/pricing). ## Creating Your First Agent To understand how the API works visit the [Run Task](/api-reference/api-v1/run-task?playground=open) page. ```bash curl -X POST https://api.browser-use.com/api/v1/run-task \ -H "Authorization: Bearer your_api_key_here" \ -H "Content-Type: application/json" \ -d '{ "task": "Go to google.com and search for Browser Use" }' ``` `run-task` API returns a task ID, which you can query to get the task status, live preview URL, and the result output. To play around with the API, you can use the [Browser Use Cloud Playground](https://cloud.browser-use.com/playground). For the full implementation guide see the [Implementation](/cloud/v1/implementation) page. # Search API Source: https://docs.browser-use.com/cloud/v1/search Get started with Browser Use's search endpoints to extract content from websites **🧪 BETA - This API is in beta - it may change and might not be available at all times.** ## Why Browser Use Over Traditional Search? **Browser Use actually browses websites like a human** while other tools return cached data from landing pages. Browser Use navigates deep into sites in real-time: * 🔍 **Deep navigation**: Clicks through menus, forms, and multiple pages to find buried content * 🚀 **Always current**: Live prices, breaking news, real-time analytics - not cached results * 🎯 **No stale data**: See exactly what's on the page right now * 🌐 **Dynamic content**: Handles JavaScript, forms, and interactive elements * 🏠 **No surface limitations**: Gets data from pages that require navigation or interaction **Other tools see yesterday's front door. Browser Use explores today's whole house.** ## Quick Start The Search API allows you to quickly extract relevant content from websites using AI. There are two main endpoints: 💡 **Complete working examples** are available in the [examples/search](https://github.com/browser-use/browser-use/tree/main/examples/search) folder. ### Simple Search Search Google and extract content from multiple top results: ```python import aiohttp import asyncio async def simple_search(): payload = { "query": "latest AI news", "max_websites": 5, "depth": 2 } headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" } async with aiohttp.ClientSession() as session: async with session.post( "https://api.browser-use.com/api/v1/simple-search", json=payload, headers=headers ) as response: result = await response.json() return result asyncio.run(simple_search()) ``` ### Search URL Extract content from a specific URL: ```python async def search_url(): payload = { "url": "https://browser-use.com/#pricing", "query": "Find pricing information for Browser Use", "depth": 2 } headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" } async with aiohttp.ClientSession() as session: async with session.post( "https://api.browser-use.com/api/v1/search-url", json=payload, headers=headers ) as response: result = await response.json() return result asyncio.run(search_url()) ``` ## Parameters * **query**: Search query or content to extract * **depth**: How deep to navigate within each website (2-5, default: 2) * `depth=2`: Checks main page + 1 click deeper * `depth=3`: Checks main page + 2 clicks deeper * `depth=5`: Thoroughly explores multiple navigation levels * **max\_websites**: Number of websites to process (simple-search only, default: 5) * **url**: Target URL to extract from (search-url only) ## Pricing ### Simple Search **Cost per request**: `1 cent × depth × max_websites` Example: depth=2, max\_websites=3 = 6 cents per request ### Search URL **Cost per request**: `1 cent × depth` Example: depth=2 = 2 cents per request # Webhooks Source: https://docs.browser-use.com/cloud/v1/webhooks Learn how to integrate webhooks with Browser Use Cloud API Webhooks allow you to receive real-time notifications about events in your Browser Use tasks. This guide will show you how to set up and verify webhook endpoints. ## Prerequisites You need an active subscription to create webhooks. See your billing page [cloud.browser-use.com/billing](https://cloud.browser-use.com/billing) ## Setting Up Webhooks To receive webhook notifications, you need to: 1. Create an endpoint that can receive HTTPS POST requests 2. Configure your webhook URL in the Browser Use dashboard 3. Implement signature verification to ensure webhook authenticity When adding a webhook URL in the dashboard, it must be a valid HTTPS URL that can receive POST requests. On creation, we will send a test payload `{"type": "test", "timestamp": "2024-03-21T12:00:00Z", "payload": {"test": "ok"}}` to verify the endpoint is working correctly before creating the actual webhook! ## Webhook Events Browser Use sends various types of events. Each event has a specific type and payload structure. ### Event Types Currently supported events: | Event Type | Description | | -------------------------- | -------------------------------- | | `agent.task.status_update` | Status updates for running tasks | ### Task Status Updates The `agent.task.status_update` event includes the following statuses: | Status | Description | | -------------- | -------------------------------------- | | `initializing` | A task is initializing | | `started` | A Task has started (browser available) | | `paused` | A task has been paused mid execution | | `stopped` | A task has been stopped mid execution | | `finished` | A task has finished | ## Webhook Payload Structure Each webhook call includes: * A JSON payload with event details * `X-Browser-Use-Timestamp` header with the current timestamp * `X-Browser-Use-Signature` header for verification The payload follows this structure: ```json { "type": "agent.task.status_update", "timestamp": "2025-05-25T09:22:22.269116+00:00", "payload": { "session_id": "cd9cc7bf-e3af-4181-80a2-73f083bc94b4", "task_id": "5b73fb3f-a3cb-4912-be40-17ce9e9e1a45", "status": "finished", "metadata": { "campaign": "q4-automation", "team": "marketing" } } } ``` The webhook payload now includes a `metadata` field containing any custom key-value pairs that were provided when the task was created. This allows you to correlate webhook events with your internal tracking systems. ## Implementing Webhook Verification To ensure webhook authenticity, you must verify the signature. Here's an example implementation in Python using FastAPI: ```python import uvicorn import hmac import hashlib import json import os from fastapi import FastAPI, Request, HTTPException app = FastAPI() SECRET_KEY = os.environ['SECRET_KEY'] def verify_signature(payload: dict, timestamp: str, received_signature: str) -> bool: message = f'{timestamp}.{json.dumps(payload, separators=(",", ":"), sort_keys=True)}' expected_signature = hmac.new(SECRET_KEY.encode(), message.encode(), hashlib.sha256).hexdigest() return hmac.compare_digest(expected_signature, received_signature) @app.post('/webhook') async def webhook(request: Request): body = await request.json() timestamp = request.headers.get('X-Browser-Use-Timestamp') signature = request.headers.get('X-Browser-Use-Signature') if not timestamp or not signature: raise HTTPException(status_code=400, detail='Missing timestamp or signature') if not verify_signature(body, timestamp, signature): raise HTTPException(status_code=403, detail='Invalid signature') # Handle different event types event_type = body.get('type') if event_type == 'agent.task.status_update': # Handle task status update print('Task status update received:', body['payload']) elif event_type == 'test': # Handle test webhook print('Test webhook received:', body['payload']) else: print('Unknown event type:', event_type) return {'status': 'success', 'message': 'Webhook received'} if __name__ == '__main__': uvicorn.run(app, host='0.0.0.0', port=8080) ``` ## Best Practices 1. **Always verify signatures**: Never process webhook payloads without verifying the signature 2. **Handle retries**: Browser Use will retry failed webhook deliveries up to 5 times 3. **Respond quickly**: Return a 200 response as soon as you've verified the signature 4. **Process asynchronously**: Handle the webhook payload processing in a background task 5. **Monitor failures**: Set up monitoring for webhook delivery failures 6. **Handle unknown events**: Implement graceful handling of new event types that may be added in the future Need help? Contact our support team at [support@browser-use.com](mailto:support@browser-use.com) or join our [Discord community](https://link.browser-use.com/discord) # All Parameters Source: https://docs.browser-use.com/customize/agent/all-parameters Complete reference for all agent configuration options ## Available Parameters ### Core Settings * `tools`: Registry of [our tools](https://github.com/browser-use/browser-use/blob/main/browser_use/tools/service.py) the agent can call. [Example for custom tools](https://github.com/browser-use/browser-use/tree/main/examples/custom-functions) * `browser`: Browser object where you can specify the browser settings. * `output_model_schema`: Pydantic model class for structured output validation. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/custom_output.py) ### Vision & Processing * `use_vision` (default: `True`): Enable/disable vision capabilities for processing screenshots * `vision_detail_level` (default: `'auto'`): Screenshot detail level - `'low'`, `'high'`, or `'auto'` * `page_extraction_llm`: Separate LLM model for page content extraction. You can choose a small & fast model because it only needs to extract text from the page (default: same as `llm`) ### Actions & Behavior * `initial_actions`: List of actions to run before the main task without LLM. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/initial_actions.py) * `max_actions_per_step` (default: `10`): Maximum actions per step, e.g. for form filling the agent can output 10 fields at once. We execute the actions until the page changes. * `max_failures` (default: `3`): Maximum retries for steps with errors * `use_thinking` (default: `True`): Controls whether the agent uses its internal "thinking" field for explicit reasoning steps. * `flash_mode` (default: `False`): Fast mode that skips evaluation, next goal and thinking and only uses memory. If `flash_mode` is enabled, it overrides `use_thinking` and disables the thinking process entirely. [Example](https://github.com/browser-use/browser-use/blob/main/examples/getting_started/05_fast_agent.py) ### System Messages * `override_system_message`: Completely replace the default system prompt. * `extend_system_message`: Add additional instructions to the default system prompt. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/custom_system_prompt.py) ### File & Data Management * `save_conversation_path`: Path to save complete conversation history * `save_conversation_path_encoding` (default: `'utf-8'`): Encoding for saved conversations * `available_file_paths`: List of file paths the agent can access * `sensitive_data`: Dictionary of sensitive data to handle carefully. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/sensitive_data.py) ### Visual Output * `generate_gif` (default: `False`): Generate GIF of agent actions. Set to `True` or string path * `include_attributes`: List of HTML attributes to include in page analysis ### Performance & Limits * `max_history_items`: Maximum number of last steps to keep in the LLM memory. If `None`, we keep all steps. * `llm_timeout` (default: `90`): Timeout in seconds for LLM calls * `step_timeout` (default: `120`): Timeout in seconds for each step * `directly_open_url` (default: `True`): If we detect a url in the task, we directly open it. ### Advanced Options * `calculate_cost` (default: `False`): Calculate and track API costs * `display_files_in_done_text` (default: `True`): Show file information in completion messages ### Backwards Compatibility * `controller`: Alias for `tools` for backwards compatibility. * `browser_session`: Alias for `browser` for backwards compatibility. # Basics Source: https://docs.browser-use.com/customize/agent/basics ```python from browser_use import Agent, ChatOpenAI agent = Agent( task="Search for latest news about AI", llm=ChatOpenAI(model="gpt-4.1-mini"), ) async def main(): history = await agent.run(max_steps=100) ``` * `task`: The task you want to automate. * `llm`: Your favorite LLM. See Supported Models. The agent is executed using the async `run()` method: * `max_steps` (default: `100`): Maximum number of steps the agent can take # Output Format Source: https://docs.browser-use.com/customize/agent/output-format ## Agent History The `run()` method returns an `AgentHistoryList` object with the complete execution history: ```python history = await agent.run() # Access useful information history.urls() # List of visited URLs history.screenshot_paths() # List of screenshot paths history.screenshots() # List of screenshots as base64 strings history.action_names() # Names of executed actions history.extracted_content() # List of extracted content from all actions history.errors() # List of errors (with None for steps without errors) history.model_actions() # All actions with their parameters history.model_outputs() # All model outputs from history history.last_action() # Last action in history # Analysis methods history.final_result() # Get the final extracted content (last step) history.is_done() # Check if agent completed successfully history.is_successful() # Check if agent completed successfully (returns None if not done) history.has_errors() # Check if any errors occurred history.model_thoughts() # Get the agent's reasoning process (AgentBrain objects) history.action_results() # Get all ActionResult objects from history history.action_history() # Get truncated action history with essential fields history.number_of_steps() # Get the number of steps in the history history.total_duration_seconds() # Get total duration of all steps in seconds # Structured output (when using output_model_schema) history.structured_output # Property that returns parsed structured output ``` See all helper methods in the [AgentHistoryList source code](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py#L301). ## Structured Output For structured output, use the `output_model_schema` parameter with a Pydantic model. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/custom_output.py). # Supported Models Source: https://docs.browser-use.com/customize/agent/supported-models Choose your favorite LLM ### Recommendations * Best accuracy: `O3` * Fastest: `llama4` on groq * Balanced: fast + cheap + clever: `gemini-2.5-flash` or `gpt-4.1-mini` ### OpenAI [example](https://github.com/browser-use/browser-use/blob/main/examples/models/gpt-4.1.py) `O3` model is recommended for best performance. ```python from browser_use import Agent, ChatOpenAI # Initialize the model llm = ChatOpenAI( model="o3", ) # Create agent with the model agent = Agent( task="...", # Your task here llm=llm ) ``` Required environment variables: ```bash .env OPENAI_API_KEY= ``` You can use any OpenAI compatible model by passing the model name to the `ChatOpenAI` class using a custom URL (or any other parameter that would go into the normal OpenAI API call). ### Anthropic [example](https://github.com/browser-use/browser-use/blob/main/examples/models/claude-4-sonnet.py) ```python from browser_use import Agent, ChatAnthropic # Initialize the model llm = ChatAnthropic( model="claude-sonnet-4-0", ) # Create agent with the model agent = Agent( task="...", # Your task here llm=llm ) ``` And add the variable: ```bash .env ANTHROPIC_API_KEY= ``` ### Azure OpenAI [example](https://github.com/browser-use/browser-use/blob/main/examples/models/azure_openai.py) ```python from browser_use import Agent, ChatAzureOpenAI from pydantic import SecretStr import os # Initialize the model llm = ChatAzureOpenAI( model="o4-mini", ) # Create agent with the model agent = Agent( task="...", # Your task here llm=llm ) ``` Required environment variables: ```bash .env AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/ AZURE_OPENAI_API_KEY= ``` ### Gemini [example](https://github.com/browser-use/browser-use/blob/main/examples/models/gemini.py) > \[!IMPORTANT] `GEMINI_API_KEY` was the old environment var name, it should be called `GOOGLE_API_KEY` as of 2025-05. ```python from browser_use import Agent, ChatGoogle from dotenv import load_dotenv # Read GOOGLE_API_KEY into env load_dotenv() # Initialize the model llm = ChatGoogle(model='gemini-2.5-flash') # Create agent with the model agent = Agent( task="Your task here", llm=llm ) ``` Required environment variables: ```bash .env GOOGLE_API_KEY= ``` ### AWS Bedrock [example](https://github.com/browser-use/browser-use/blob/main/examples/models/aws.py) AWS Bedrock provides access to multiple model providers through a single API. We support both a general AWS Bedrock client and provider-specific convenience classes. #### General AWS Bedrock (supports all providers) ```python from browser_use import Agent, ChatAWSBedrock # Works with any Bedrock model (Anthropic, Meta, AI21, etc.) llm = ChatAWSBedrock( model="anthropic.claude-3-5-sonnet-20240620-v1:0", # or any Bedrock model aws_region="us-east-1", ) # Create agent with the model agent = Agent( task="Your task here", llm=llm ) ``` #### Anthropic Claude via AWS Bedrock (convenience class) ```python from browser_use import Agent, ChatAnthropicBedrock # Anthropic-specific class with Claude defaults llm = ChatAnthropicBedrock( model="anthropic.claude-3-5-sonnet-20240620-v1:0", aws_region="us-east-1", ) # Create agent with the model agent = Agent( task="Your task here", llm=llm ) ``` #### AWS Authentication Required environment variables: ```bash .env AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY= AWS_DEFAULT_REGION=us-east-1 ``` You can also use AWS profiles or IAM roles instead of environment variables. The implementation supports: * Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_DEFAULT_REGION`) * AWS profiles and credential files * IAM roles (when running on EC2) * Session tokens for temporary credentials * AWS SSO authentication (`aws_sso_auth=True`) ## Groq [example](https://github.com/browser-use/browser-use/blob/main/examples/models/llama4-groq.py) ```python from browser_use import Agent, ChatGroq llm = ChatGroq(model="meta-llama/llama-4-maverick-17b-128e-instruct") agent = Agent( task="Your task here", llm=llm ) ``` Required environment variables: ```bash .env GROQ_API_KEY= ``` ## Ollama ```python from browser_use import Agent, ChatOllama llm = ChatOllama(model="llama3.1:8b") ``` ## Langchain [Example](https://github.com/browser-use/browser-use/blob/main/examples/models/langchain) on how to use Langchain with Browser Use. ## Other models (DeepSeek, Novita, X, Qwen...) We support all other models that can be called via OpenAI compatible API. We are open to PRs for more providers. **Examples available:** * [DeepSeek](https://github.com/browser-use/browser-use/blob/main/examples/models/deepseek-chat.py) * [Novita](https://github.com/browser-use/browser-use/blob/main/examples/models/novita.py) * [OpenRouter](https://github.com/browser-use/browser-use/blob/main/examples/models/openrouter.py) # All Parameters Source: https://docs.browser-use.com/customize/browser/all-parameters Complete reference for all browser configuration options ## Core Settings * `cdp_url`: CDP URL for connecting to existing browser instance (e.g., `"http://localhost:9222"`) ## Display & Appearance * `headless` (default: `None`): Run browser without UI. Auto-detects based on display availability (`True`/`False`/`None`) * `window_size`: Browser window size for headful mode. Use dict `{'width': 1920, 'height': 1080}` or `ViewportSize` object * `window_position` (default: `{'width': 0, 'height': 0}`): Window position from top-left corner in pixels * `viewport`: Content area size, same format as `window_size`. Use `{'width': 1280, 'height': 720}` or `ViewportSize` object * `no_viewport` (default: `None`): Disable viewport emulation, content fits to window size * `device_scale_factor`: Device scale factor (DPI). Set to `2.0` or `3.0` for high-resolution screenshots ## Browser Behavior * `keep_alive` (default: `None`): Keep browser running after agent completes * `allowed_domains`: Restrict navigation to specific domains. Domain pattern formats: * `'example.com'` - Matches only `https://example.com/*` * `'*.example.com'` - Matches `https://example.com/*` and any subdomain `https://*.example.com/*` * `'http*://example.com'` - Matches both `http://` and `https://` protocols * `'chrome-extension://*'` - Matches any Chrome extension URL * **Security**: Wildcards in TLD (e.g., `example.*`) are **not allowed** for security * Use list like `['*.google.com', 'https://example.com', 'chrome-extension://*']` * `enable_default_extensions` (default: `True`): Load automation extensions (uBlock Origin, cookie handlers, ClearURLs) * `cross_origin_iframes` (default: `False`): Enable cross-origin iframe support (may cause complexity) * `is_local` (default: `True`): Whether this is a local browser instance. Set to `False` for remote browsers. If we have a `executable_path` set, it will be automatically set to `True`. This can effect your download behavior. ## User Data & Profiles * `user_data_dir` (default: auto-generated temp): Directory for browser profile data. Use `None` for incognito mode * `profile_directory` (default: `'Default'`): Chrome profile subdirectory name (`'Profile 1'`, `'Work Profile'`, etc.) * `storage_state`: Browser storage state (cookies, localStorage). Can be file path string or dict object ## Network & Security * `proxy`: Proxy configuration using `ProxySettings(server='http://host:8080', bypass='localhost,127.0.0.1', username='user', password='pass')` * `permissions` (default: `['clipboardReadWrite', 'notifications']`): Browser permissions to grant. Use list like `['camera', 'microphone', 'geolocation']` * `headers`: Additional HTTP headers for connect requests (remote browsers only) ## Browser Launch * `executable_path`: Path to browser executable for custom installations. Platform examples: * macOS: `'/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'` * Windows: `'C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'` * Linux: `'/usr/bin/google-chrome'` * `channel`: Browser channel (`'chromium'`, `'chrome'`, `'chrome-beta'`, `'msedge'`, etc.) * `args`: Additional command-line arguments for the browser. Use list format: `['--disable-gpu', '--custom-flag=value', '--another-flag']` * `env`: Environment variables for browser process. Use dict like `{'DISPLAY': ':0', 'LANG': 'en_US.UTF-8', 'CUSTOM_VAR': 'test'}` * `chromium_sandbox` (default: `True` except in Docker): Enable Chromium sandboxing for security * `devtools` (default: `False`): Open DevTools panel automatically (requires `headless=False`) * `ignore_default_args`: List of default args to disable, or `True` to disable all. Use list like `['--enable-automation', '--disable-extensions']` ## Timing & Performance * `minimum_wait_page_load_time` (default: `0.25`): Minimum time to wait before capturing page state in seconds * `wait_for_network_idle_page_load_time` (default: `0.5`): Time to wait for network activity to cease in seconds * `wait_between_actions` (default: `0.5`): Time to wait between agent actions in seconds ## AI Integration * `highlight_elements` (default: `True`): Highlight interactive elements for AI vision ## Downloads & Files * `accept_downloads` (default: `True`): Automatically accept all downloads * `downloads_path`: Directory for downloaded files. Use string like `'./downloads'` or `Path` object * `auto_download_pdfs` (default: `True`): Automatically download PDFs instead of viewing in browser ## Device Emulation * `user_agent`: Custom user agent string. Example: `'Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X)'` * `screen`: Screen size information, same format as `window_size` ## Recording & Debugging * `record_video_dir`: Directory to save video recordings as `.webm` files * `record_har_path`: Path to save network trace files as `.har` format * `traces_dir`: Directory to save complete trace files for debugging * `record_har_content` (default: `'embed'`): HAR content mode (`'omit'`, `'embed'`, `'attach'`) * `record_har_mode` (default: `'full'`): HAR recording mode (`'full'`, `'minimal'`) ## Advanced Options * `disable_security` (default: `False`): ⚠️ **NOT RECOMMENDED** - Disables all browser security features * `deterministic_rendering` (default: `False`): ⚠️ **NOT RECOMMENDED** - Forces consistent rendering but reduces performance *** ## Outdated BrowserProfile For backward compatibility, you can pass all the parameters from above to the `BrowserProfile` and then to the `Browser`. ```python from browser_use import BrowserProfile profile = BrowserProfile(headless=False) browser = Browser(browser_profile=profile) ``` ## Browser vs BrowserSession `Browser` is an alias for `BrowserSession` - they are exactly the same class: Use `Browser` for cleaner, more intuitive code. # Basics Source: https://docs.browser-use.com/customize/browser/basics *** ```python from browser_use import Agent, Browser, ChatOpenAI browser = Browser( headless=False, # Show browser window window_size={'width': 1000, 'height': 700}, # Set window size ) agent = Agent( task='Search for Browser Use', browser=browser, llm=ChatOpenAI(model='gpt-4.1-mini'), ) async def main(): await agent.run() ``` # Real Browser Source: https://docs.browser-use.com/customize/browser/real-browser Connect your existing Chrome browser to preserve authentication. ## Basic Example ```python from browser_use import Agent, Browser, ChatOpenAI # Connect to your existing Chrome browser browser = Browser( executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome', user_data_dir='~/Library/Application Support/Google/Chrome', profile_directory='Default', ) agent = Agent( task='Visit https://duckduckgo.com and search for "browser-use founders"', browser=browser, llm=ChatOpenAI(model='gpt-4.1-mini'), ) async def main(): await agent.run() ``` > **Note:** You need to fully close chrome before running this example. > **Note:** Google blocks this approach currently so we use DuckDuckGo instead. ## How it Works 1. **`executable_path`** - Path to your Chrome installation 2. **`user_data_dir`** - Your Chrome profile folder (keeps cookies, extensions, bookmarks) 3. **`profile_directory`** - Specific profile name (Default, Profile 1, etc.) ## Platform Paths ```python # macOS executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome' user_data_dir='~/Library/Application Support/Google/Chrome' # Windows executable_path='C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe' user_data_dir='%LOCALAPPDATA%\\Google\\Chrome\\User Data' # Linux executable_path='/usr/bin/google-chrome' user_data_dir='~/.config/google-chrome' ``` # Remote Browser Source: https://docs.browser-use.com/customize/browser/remote ```python from browser_use import Agent, Browser, ChatOpenAI # Connect to remote browser browser = Browser( cdp_url='http://remote-server:9222' ) agent = Agent( task="Your task here", llm=ChatOpenAI(model='gpt-4.1-mini'), browser=browser, ) ``` ## Get a CDP URL ### Cloud Browser Get a cdp url from your favorite browser provider like AnchorBorwser, HyperBrowser, BrowserBase, Steel.dev, etc. ### Proxy Connection ```python from browser_use import Agent, Browser, ChatOpenAI from browser_use.browser import ProxySettings browser = Browser( headless=False, proxy=ProxySettings( server="http://proxy-server:8080", username="proxy-user", password="proxy-pass" ) cdp_url="http://remote-server:9222" ) agent = Agent( task="Your task here", llm=ChatOpenAI(model='gpt-4.1-mini'), browser=browser, ) ``` # Chain Agents Source: https://docs.browser-use.com/customize/examples/chain-agents Chain multiple tasks together with the same agent and browser session. ## Chain Agent Tasks Keep your browser session alive and chain multiple tasks together. Perfect for conversational workflows or multi-step processes. ```python import asyncio from dotenv import load_dotenv load_dotenv() from browser_use import Agent, BrowserProfile profile = BrowserProfile(keep_alive=True) async def main(): agent = Agent(task="Go to reddit.com", browser_profile=profile) await agent.run(max_steps=1) while True: user_response = input('\n👤 New task or "q" to quit: ') if user_response.lower() == 'q': break agent.add_new_task(f'New task: {user_response}') await agent.run() if __name__ == '__main__': asyncio.run(main()) ``` ## How It Works 1. **Persistent Browser**: `BrowserProfile(keep_alive=True)` prevents browser from closing between tasks 2. **Task Chaining**: Use `agent.add_new_task()` to add follow-up tasks 3. **Context Preservation**: Agent maintains memory and browser state across tasks 4. **Interactive Flow**: Perfect for conversational interfaces or complex workflows The browser session remains active throughout the entire chain, preserving all cookies, local storage, and page state. # Fast Agent Source: https://docs.browser-use.com/customize/examples/fast-agent Optimize agent performance for maximum speed and efficiency. ```python import asyncio from dotenv import load_dotenv load_dotenv() from browser_use import Agent, BrowserProfile # Speed optimization instructions for the model SPEED_OPTIMIZATION_PROMPT = """ Speed optimization instructions: - Be extremely concise and direct in your responses - Get to the goal as quickly as possible - Use multi-action sequences whenever possible to reduce steps """ async def main(): # 1. Use fast LLM - Llama 4 on Groq for ultra-fast inference from browser_use import ChatGroq llm = ChatGroq( model='meta-llama/llama-4-maverick-17b-128e-instruct', temperature=0.0, ) # from browser_use import ChatGoogle # llm = ChatGoogle(model='gemini-2.5-flash') # 2. Create speed-optimized browser profile browser_profile = BrowserProfile( minimum_wait_page_load_time=0.1, wait_between_actions=0.1, headless=False, ) # 3. Define a speed-focused task task = """ 1. Go to reddit https://www.reddit.com/search/?q=browser+agent&type=communities 2. Click directly on the first 5 communities to open each in new tabs 3. Find out what the latest post is about, and switch directly to the next tab 4. Return the latest post summary for each page """ # 4. Create agent with all speed optimizations agent = Agent( task=task, llm=llm, flash_mode=True, # Disables thinking in the LLM output for maximum speed browser_profile=browser_profile, extend_system_message=SPEED_OPTIMIZATION_PROMPT, ) await agent.run() if __name__ == '__main__': asyncio.run(main()) ``` ## Speed Optimization Techniques ### 1. Fast LLM Models ```python # Groq - Ultra-fast inference from browser_use import ChatGroq llm = ChatGroq(model='meta-llama/llama-4-maverick-17b-128e-instruct') # Google Gemini Flash - Optimized for speed from browser_use import ChatGoogle llm = ChatGoogle(model='gemini-2.5-flash') ``` ### 2. Browser Optimizations ```python browser_profile = BrowserProfile( minimum_wait_page_load_time=0.1, # Reduce wait time wait_between_actions=0.1, # Faster action execution headless=True, # No GUI overhead ) ``` ### 3. Agent Optimizations ```python agent = Agent( task=task, llm=llm, flash_mode=True, # Skip LLM thinking process extend_system_message=SPEED_PROMPT, # Optimize LLM behavior ) ``` # More Examples Source: https://docs.browser-use.com/customize/examples/more-examples Explore additional examples and use cases on GitHub. ### 🔗 Browse All Examples **[View Complete Examples Directory →](https://github.com/browser-use/browser-use/tree/main/examples)** ### 🤝 Contributing Examples Have a great use case? **[Submit a pull request](https://github.com/browser-use/browser-use/pulls)** with your example! # Parallel Agents Source: https://docs.browser-use.com/customize/examples/parallel-browser Run multiple agents in parallel with separate browser instances ```python import asyncio from browser_use import Agent, Browser, ChatOpenAI async def main(): # Create 3 separate browser instances browsers = [ Browser( user_data_dir=f'./temp-profile-{i}', headless=False, ) for i in range(3) ] # Create 3 agents with different tasks agents = [ Agent( task='Search for "browser automation" on Google', browser=browsers[0], llm=ChatOpenAI(model='gpt-4.1-mini'), ), Agent( task='Search for "AI agents" on DuckDuckGo', browser=browsers[1], llm=ChatOpenAI(model='gpt-4.1-mini'), ), Agent( task='Visit Wikipedia and search for "web scraping"', browser=browsers[2], llm=ChatOpenAI(model='gpt-4.1-mini'), ), ] # Run all agents in parallel tasks = [agent.run() for agent in agents] results = await asyncio.gather(*tasks, return_exceptions=True) print('🎉 All agents completed!') ``` > **Note:** This is experimental, and agents might conflict each other. # Secure Setup Source: https://docs.browser-use.com/customize/examples/secure Azure OpenAI with data privacy and security configuration. ## Secure Setup with Azure OpenAI Enterprise-grade security with Azure OpenAI, data privacy protection, and restricted browser access. ```python import asyncio import os from dotenv import load_dotenv load_dotenv() os.environ['ANONYMIZED_TELEMETRY'] = 'false' from browser_use import Agent, BrowserProfile, ChatAzureOpenAI # Azure OpenAI configuration api_key = os.getenv('AZURE_OPENAI_KEY') azure_endpoint = os.getenv('AZURE_OPENAI_ENDPOINT') llm = ChatAzureOpenAI(model='gpt-4.1-mini', api_key=api_key, azure_endpoint=azure_endpoint) # Secure browser configuration browser_profile = BrowserProfile( allowed_domains=['*google.com', 'browser-use.com'], enable_default_extensions=False ) # Sensitive data filtering sensitive_data = {'company_name': 'browser-use'} # Create secure agent agent = Agent( task='Find the founders of the sensitive company_name', llm=llm, browser_profile=browser_profile, sensitive_data=sensitive_data ) async def main(): await agent.run(max_steps=10) asyncio.run(main()) ``` ## Security Features **Azure OpenAI:** * NOT used to train OpenAI models * NOT shared with other customers * Hosted entirely within Azure * 30-day retention (or zero with Limited Access Program) **Browser Security:** * `allowed_domains`: Restrict navigation to trusted sites * `enable_default_extensions=False`: Disable potentially dangerous extensions * `sensitive_data`: Filter sensitive information from LLM input For enterprise deployments contact [support@browser-use.com](mailto:support@browser-use.com). # Sensitive Data Source: https://docs.browser-use.com/customize/examples/sensitive-data Handle sensitive information securely and avoid sending PII & passwords to the LLM. ```python import os from browser_use import Agent, Browser, ChatOpenAI os.environ['ANONYMIZED_TELEMETRY'] = "false" agent = Agent( task='Log into example.com with username x_user and password x_pass', sensitive_data={ 'https://example.com': { 'x_user': 'your-real-username@email.com', 'x_pass': 'your-real-password123', }, }, use_vision=False, # Disable vision to prevent LLM seeing sensitive data in screenshots llm=ChatOpenAI(model='gpt-4.1-mini'), ) async def main(): await agent.run() ``` ## How it Works 1. **Text Filtering**: The LLM only sees placeholders (`x_user`, `x_pass`), we filter your sensitive data from the input text. 2. **DOM Actions**: Real values are injected directly into form fields after the LLM call ## Best Practices * Use `Browser(allowed_domains=[...])` to restrict navigation * Set `use_vision=False` to prevent screenshot leaks * Use `storage_state='./auth.json'` for login cookies instead of passwords when possible # Lifecycle Hooks Source: https://docs.browser-use.com/customize/hooks Customize agent behavior with lifecycle hooks Browser-Use provides lifecycle hooks that allow you to execute custom code at specific points during the agent's execution. Hook functions can be used to read and modify agent state while running, implement custom logic, change configuration, integrate the Agent with external applications. ## Available Hooks Currently, Browser-Use provides the following hooks: | Hook | Description | When it's called | | --------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------- | | `on_step_start` | Executed at the beginning of each agent step | Before the agent processes the current state and decides on the next action | | `on_step_end` | Executed at the end of each agent step | After the agent has executed all the actions for the current step, before it starts the next step | ```python await agent.run(on_step_start=..., on_step_end=...) ``` Each hook should be an `async` callable function that accepts the `agent` instance as its only parameter. ### Basic Example ```python from browser_use import Agent, ChatOpenAI async def my_step_hook(agent: Agent): # inside a hook you can access all the state and methods under the Agent object: # agent.settings, agent.state, agent.task # agent.tools, agent.llm, agent.browser_session # agent.pause(), agent.resume(), agent.add_new_task(...), etc. # You also have direct access to the browser state state = await agent.browser_session.get_browser_state_summary() current_url = state.url visit_log = agent.history.urls() previous_url = visit_log[-2] if len(visit_log) >= 2 else None print(f"Agent was last on URL: {previous_url} and is now on {current_url}") # Example: listen for events on the page, interact with the DOM, run JS directly, etc. await page.on('domcontentloaded', lambda: print('page navigated to a new url...')) await page.locator("css=form > input[type=submit]").click() await page.evaluate('() => alert(1)') await page.browser.new_tab await agent.browser_session.session.context.add_init_script('/* some JS to run on every page */') # Example: monitor or intercept all network requests async def handle_request(route): # Print, modify, block, etc. do anything to the requests here # https://playwright.dev/python/docs/network#handle-requests print(route.request, route.request.headers) await route.continue_(headers=route.request.headers) await page.route("**/*", handle_route) # Example: pause agent execution and resume it based on some custom code if '/completed' in current_url: agent.pause() Path('result.txt').write_text(await page.content()) input('Saved "completed" page content to result.txt, press [Enter] to resume...') agent.resume() agent = Agent( task="Search for the latest news about AI", llm=ChatOpenAI(model="gpt-4.1-mini"), ) await agent.run( on_step_start=my_step_hook, # on_step_end=... max_steps=10 ) ``` ## Data Available in Hooks When working with agent hooks, you have access to the entire `Agent` instance. Here are some useful data points you can access: * `agent.task` lets you see what the main task is, `agent.add_new_task(...)` lets you queue up a new one * `agent.tools` give access to the `Tools()` object and `Registry()` containing the available actions * `agent.tools.registry.execute_action('click_element_by_index', {'index': 123}, browser_session=agent.browser_session)` * `agent.context` lets you access any user-provided context object passed in to `Agent(context=...)` * `agent.sensitive_data` contains the sensitive data dict, which can be updated in-place to add/remove/modify items * `agent.settings` contains all the configuration options passed to the `Agent(...)` at init time * `agent.llm` gives direct access to the main LLM object (e.g. `ChatOpenAI`) * `agent.state` gives access to lots of internal state, including agent thoughts, outputs, actions, etc. * `agent.history` gives access to historical data from the agent's execution: * `agent.history.model_thoughts()`: Reasoning from Browser Use's model. * `agent.history.model_outputs()`: Raw outputs from the Browser Use's model. * `agent.history.model_actions()`: Actions taken by the agent * `agent.history.extracted_content()`: Content extracted from web pages * `agent.history.urls()`: URLs visited by the agent * `agent.browser_session` gives direct access to the `Browser()` and CDP interface * `agent.browser_session.agent_focus`: Get the current CDP session the agent is focused on * `agent.browser_session.get_or_create_cdp_session()`: Get the current CDP session for browser interaction * `agent.browser_session.get_tabs()`: Get all tabs currently open * `agent.browser_session.get_page_html()`: Current page HTML * `agent.browser_session.take_screenshot()`: Screenshot of the current page ## Tips for Using Hooks * **Avoid blocking operations**: Since hooks run in the same execution thread as the agent, try to keep them efficient or use asynchronous patterns. * **Handle exceptions**: Make sure your hook functions handle exceptions gracefully to prevent interrupting the agent's main flow. * **Use custom actions instead**: hooks are fairly advanced, most things can be implemented with [custom action functions](/customize/custom-functions) instead *** ## Complex Example: Agent Activity Recording System This comprehensive example demonstrates a complete implementation for recording and saving Browser-Use agent activity, consisting of both server and client components. ### Setup Instructions To use this example, you'll need to: 1. Set up the required dependencies: ```bash pip install fastapi uvicorn prettyprinter pyobjtojson dotenv browser-use ``` 2. Create two separate Python files: * `api.py` - The FastAPI server component * `client.py` - The Browser-Use agent with recording hook 3. Run both components: * Start the API server first: `python api.py` * Then run the client: `python client.py` ### Server Component (api.py) The server component handles receiving and storing the agent's activity data: ```python #!/usr/bin/env python3 # # FastAPI API to record and save Browser-Use activity data. # Save this code to api.py and run with `python api.py` # import json import base64 from pathlib import Path from fastapi import FastAPI, Request import prettyprinter import uvicorn prettyprinter.install_extras() # Utility function to save screenshots def b64_to_png(b64_string: str, output_file): """ Convert a Base64-encoded string to a PNG file. :param b64_string: A string containing Base64-encoded data :param output_file: The path to the output PNG file """ with open(output_file, "wb") as f: f.write(base64.b64decode(b64_string)) # Initialize FastAPI app app = FastAPI() @app.post("/post_agent_history_step") async def post_agent_history_step(request: Request): data = await request.json() prettyprinter.cpprint(data) # Ensure the "recordings" folder exists using pathlib recordings_folder = Path("recordings") recordings_folder.mkdir(exist_ok=True) # Determine the next file number by examining existing .json files existing_numbers = [] for item in recordings_folder.iterdir(): if item.is_file() and item.suffix == ".json": try: file_num = int(item.stem) existing_numbers.append(file_num) except ValueError: # In case the file name isn't just a number pass if existing_numbers: next_number = max(existing_numbers) + 1 else: next_number = 1 # Construct the file path file_path = recordings_folder / f"{next_number}.json" # Save the JSON data to the file with file_path.open("w") as f: json.dump(data, f, indent=2) # Optionally save screenshot if needed # if "website_screenshot" in data and data["website_screenshot"]: # screenshot_folder = Path("screenshots") # screenshot_folder.mkdir(exist_ok=True) # b64_to_png(data["website_screenshot"], screenshot_folder / f"{next_number}.png") return {"status": "ok", "message": f"Saved to {file_path}"} if __name__ == "__main__": print("Starting Browser-Use recording API on http://0.0.0.0:9000") uvicorn.run(app, host="0.0.0.0", port=9000) ``` ### Client Component (client.py) The client component runs the Browser-Use agent with a recording hook: ```python #!/usr/bin/env python3 # # Client to record and save Browser-Use activity. # Save this code to client.py and run with `python client.py` # import asyncio import requests from dotenv import load_dotenv from pyobjtojson import obj_to_json from browser_use.llm import ChatOpenAI from browser_use import Agent # Load environment variables (for API keys) load_dotenv() def send_agent_history_step(data): """Send the agent step data to the recording API""" url = "http://127.0.0.1:9000/post_agent_history_step" response = requests.post(url, json=data) return response.json() async def record_activity(agent_obj): """Hook function that captures and records agent activity at each step""" website_html = None website_screenshot = None urls_json_last_elem = None model_thoughts_last_elem = None model_outputs_json_last_elem = None model_actions_json_last_elem = None extracted_content_json_last_elem = None print('--- ON_STEP_START HOOK ---') # Capture current page state website_html = await agent_obj.browser_session.get_page_html() website_screenshot = await agent_obj.browser_session.take_screenshot() # Make sure we have state history if hasattr(agent_obj, "state"): history = agent_obj.state.history else: history = None print("Warning: Agent has no state history") return # Process model thoughts model_thoughts = obj_to_json( obj=history.model_thoughts(), check_circular=False ) if len(model_thoughts) > 0: model_thoughts_last_elem = model_thoughts[-1] # Process model outputs model_outputs = agent_obj.state.history.model_outputs() model_outputs_json = obj_to_json( obj=model_outputs, check_circular=False ) if len(model_outputs_json) > 0: model_outputs_json_last_elem = model_outputs_json[-1] # Process model actions model_actions = agent_obj.state.history.model_actions() model_actions_json = obj_to_json( obj=model_actions, check_circular=False ) if len(model_actions_json) > 0: model_actions_json_last_elem = model_actions_json[-1] # Process extracted content extracted_content = agent_obj.state.history.extracted_content() extracted_content_json = obj_to_json( obj=extracted_content, check_circular=False ) if len(extracted_content_json) > 0: extracted_content_json_last_elem = extracted_content_json[-1] # Process URLs urls = agent_obj.state.history.urls() urls_json = obj_to_json( obj=urls, check_circular=False ) if len(urls_json) > 0: urls_json_last_elem = urls_json[-1] # Create a summary of all data for this step model_step_summary = { "website_html": website_html, "website_screenshot": website_screenshot, "url": urls_json_last_elem, "model_thoughts": model_thoughts_last_elem, "model_outputs": model_outputs_json_last_elem, "model_actions": model_actions_json_last_elem, "extracted_content": extracted_content_json_last_elem } print("--- MODEL STEP SUMMARY ---") print(f"URL: {urls_json_last_elem}") # Send data to the API result = send_agent_history_step(data=model_step_summary) print(f"Recording API response: {result}") async def run_agent(): """Run the Browser-Use agent with the recording hook""" agent = Agent( task="Compare the price of gpt-4o and DeepSeek-V3", llm=ChatOpenAI(model="gpt-4.1-mini"), ) try: print("Starting Browser-Use agent with recording hook") await agent.run( on_step_start=record_activity, max_steps=30 ) except Exception as e: print(f"Error running agent: {e}") if __name__ == "__main__": # Check if API is running try: requests.get("http://127.0.0.1:9000") print("Recording API is available") except: print("Warning: Recording API may not be running. Start api.py first.") # Run the agent asyncio.run(run_agent()) ``` Contribution by Carlos A. Planchón. ### Working with the Recorded Data After running the agent, you'll find the recorded data in the `recordings` directory. Here's how you can use this data: 1. **View recorded sessions**: Each JSON file contains a snapshot of agent activity for one step 2. **Extract screenshots**: You can modify the API to save screenshots separately 3. **Analyze agent behavior**: Use the recorded data to study how the agent navigates websites ### Extending the Example You can extend this recording system in several ways: 1. **Save screenshots separately**: Uncomment the screenshot saving code in the API 2. **Add a web dashboard**: Create a simple web interface to view recorded sessions 3. **Add session IDs**: Modify the API to group steps by agent session 4. **Add filtering**: Implement filters to record only specific types of actions # MCP Client Source: https://docs.browser-use.com/customize/mcp-client Connect external MCP servers to extend browser-use with additional tools and integrations The MCP (Model Context Protocol) client allows browser-use agents to connect to external MCP servers, automatically exposing their tools as actions. MCP is an open protocol for integrating LLMs with external data sources and tools. Learn more at [modelcontextprotocol.io](https://modelcontextprotocol.io). Looking to expose browser-use as an MCP server instead? See [MCP Server](/customize/mcp-server). ## Installation ```bash uv pip install "browser-use[cli]" ``` ## Quick Start ```python import os from browser_use import Agent, Tools from browser_use.mcp.client import MCPClient # Create tools tools = Tools() # Connect to MCP server mcp_client = MCPClient( server_name="filesystem", command="npx", args=["@modelcontextprotocol/server-filesystem", "/path/to/files"] ) # Connect and register await mcp_client.connect() await mcp_client.register_to_tools(tools) # Agent can now use filesystem tools agent = Agent( task="Read the README.md file", tools=tools ) await agent.run() # Clean up await mcp_client.disconnect() ``` ## API Reference ### MCPClient ```python class MCPClient: def __init__( self, server_name: str, command: str, args: list[str] | None = None, env: dict[str, str] | None = None, ) -> None ``` **Parameters:** * `server_name`: Name of the MCP server (for logging) * `command`: Command to start the server (e.g., `"npx"`) * `args`: Arguments for the command * `env`: Environment variables for the server **Key Methods:** ```python # Connect to server await mcp_client.connect() # Register tools to tools await mcp_client.register_to_tools( tools, tool_filter=['read_file', 'write_file'], # Optional prefix='fs_' # Optional prefix ) # Disconnect await mcp_client.disconnect() ``` ### Context Manager Usage ```python async with MCPClient( server_name="github", command="npx", args=["@modelcontextprotocol/server-github"], env={"GITHUB_TOKEN": os.getenv("GITHUB_TOKEN")} ) as client: await client.register_to_tools(tools) await agent.run() # Automatically disconnected ``` ## Common MCP Servers ### Filesystem ```python MCPClient( server_name="filesystem", command="npx", args=["@modelcontextprotocol/server-filesystem", "/path"] ) ``` ### PostgreSQL ```python MCPClient( server_name="postgres", command="npx", args=["@modelcontextprotocol/server-postgres", "postgresql://localhost/db"] ) ``` ### GitHub ```python MCPClient( server_name="github", command="npx", args=["@modelcontextprotocol/server-github"], env={"GITHUB_TOKEN": os.getenv("GITHUB_TOKEN")} ) ``` ## Multiple Servers Connect multiple servers with prefixes to avoid conflicts: ```python # Filesystem server fs_client = MCPClient( server_name="filesystem", command="npx", args=["@modelcontextprotocol/server-filesystem", "."] ) await fs_client.connect() await fs_client.register_to_tools(tools, prefix="fs_") # GitHub server gh_client = MCPClient( server_name="github", command="npx", args=["@modelcontextprotocol/server-github"], env={"GITHUB_TOKEN": os.getenv("GITHUB_TOKEN")} ) await gh_client.connect() await gh_client.register_to_tools(tools, prefix="gh_") # Agent can use both agent = Agent( task="Read README.md and create a GitHub issue", tools=tools ) await agent.run() # Clean up await fs_client.disconnect() await gh_client.disconnect() ``` ## Tool Filtering Register only specific tools: ```python await mcp_client.register_to_tools( tools, tool_filter=['read_file', 'list_directory'] ) ``` ## Custom MCP Server Create your own MCP server: ```python # my_server.py import mcp.server.stdio import mcp.types as types from mcp.server import Server server = Server("custom-tools") @server.list_tools() async def handle_list_tools() -> list[types.Tool]: return [ types.Tool( name="calculate", description="Perform calculation", inputSchema={ "type": "object", "properties": { "expression": {"type": "string"} }, "required": ["expression"] } ) ] @server.call_tool() async def handle_call_tool(name: str, arguments: dict) -> list[types.TextContent]: if name == "calculate": result = eval(arguments["expression"]) return [types.TextContent(type="text", text=str(result))] return [] # Run server async def main(): async with mcp.server.stdio.stdio_server() as (read, write): await server.run(read, write, ...) if __name__ == "__main__": import asyncio asyncio.run(main()) ``` Connect custom server: ```python custom_client = MCPClient( server_name="custom", command="python", args=["my_server.py"] ) ``` ## Best Practices 1. **Always disconnect** when done 2. **Use prefixes** when connecting multiple servers 3. **Filter tools** to limit capabilities 4. **Use context managers** for automatic cleanup ## See Also * [MCP Server](/customize/mcp-server) - Expose browser-use as an MCP server * [Custom Functions](/customize/custom-functions) - Write custom actions directly * [Model Context Protocol](https://modelcontextprotocol.io) - MCP specification # MCP Server Source: https://docs.browser-use.com/customize/mcp-server Expose browser-use capabilities as an MCP server for AI assistants like Claude Desktop The MCP server exposes browser-use's browser automation capabilities as tools that can be used by AI assistants like Claude Desktop. This allows external MCP clients to control browsers, navigate websites, extract content, and perform automated tasks. This is the opposite of the [MCP Client](/customize/mcp-client). The MCP client lets browser-use connect to external MCP servers, while this MCP server lets external AI assistants connect to browser-use. ## Overview The MCP server acts as a bridge between MCP-compatible AI assistants and browser-use: ```mermaid graph LR A[Claude Desktop] -->|MCP Protocol| B[Browser-use MCP Server] B --> C[Browser] B --> D[Tools] B --> E[FileSystem] C --> F[Playwright Browser] style B fill:#f9f,stroke:#333,stroke-width:2px ``` ## Installation ```bash uv pip install "browser-use[cli]" ``` ## Quick Start ### 1. Configure Claude Desktop Add browser-use to your Claude Desktop configuration: Edit `~/Library/Application Support/Claude/claude_desktop_config.json`: ```json { "mcpServers": { "browser-use": { "command": "uvx", "args": ["browser-use[cli]", "--mcp"], "env": { "OPENAI_API_KEY": "sk-..." // Optional: for content extraction } } } } ``` Edit `%APPDATA%\Claude\claude_desktop_config.json`: ```json { "mcpServers": { "browser-use": { "command": "uvx", "args": ["browser-use[cli]", "--mcp"], "env": { "OPENAI_API_KEY": "sk-..." // Optional: for content extraction } } } } ``` ### 2. Restart Claude Desktop The browser-use tools will appear in Claude's tools menu (🔌 icon). ### 3. Use Browser Automation Ask Claude to perform browser tasks: * "Navigate to example.com and describe what you see" * "Search for 'browser automation' on Google" * "Fill out the contact form on this website" ## API Reference ### Available Tools The MCP server exposes the following tools to MCP clients: #### Navigation Tools ##### `browser_navigate` Navigate to a URL. ```typescript browser_navigate(url: string, new_tab?: boolean): string ``` **Parameters:** | Parameter | Type | Required | Description | | --------- | --------- | -------- | -------------------------------- | | `url` | `string` | Yes | URL to navigate to | | `new_tab` | `boolean` | No | Open in new tab (default: false) | **Returns:** Success message with URL ##### `browser_go_back` Navigate back in browser history. ```typescript browser_go_back(): string ``` **Returns:** "Navigated back" #### Interaction Tools ##### `browser_click` Click an element by index. ```typescript browser_click(index: number, new_tab?: boolean): string ``` **Parameters:** | Parameter | Type | Required | Description | | --------- | --------- | -------- | ------------------------------------- | | `index` | `number` | Yes | Element index from browser state | | `new_tab` | `boolean` | No | Open link in new tab (default: false) | **Returns:** Success message indicating click action **Note:** When `new_tab` is true: * For links: Extracts href and opens in new tab * For other elements: Uses Cmd/Ctrl+Click ##### `browser_type` Type text into an input field. ```typescript browser_type(index: number, text: string): string ``` **Parameters:** | Parameter | Type | Required | Description | | --------- | -------- | -------- | -------------------------------- | | `index` | `number` | Yes | Element index from browser state | | `text` | `string` | Yes | Text to type | **Returns:** Success message with typed text ##### `browser_scroll` Scroll the page. ```typescript browser_scroll(direction?: "up" | "down"): string ``` **Parameters:** | Parameter | Type | Required | Description | | ----------- | ---------------- | -------- | ---------------------------------- | | `direction` | `"up" \| "down"` | No | Scroll direction (default: "down") | **Returns:** "Scrolled {direction}" #### State & Content Tools ##### `browser_get_state` Get current browser state with all interactive elements. ```typescript browser_get_state(include_screenshot?: boolean): string ``` **Parameters:** | Parameter | Type | Required | Description | | -------------------- | --------- | -------- | ------------------------------------------ | | `include_screenshot` | `boolean` | No | Include base64 screenshot (default: false) | **Returns:** JSON string containing: ```json { "url": "current page URL", "title": "page title", "tabs": [{"url": "...", "title": "..."}], "interactive_elements": [ { "index": 0, "tag": "button", "text": "element text (max 100 chars)", "placeholder": "if present", "href": "if link" } ], "screenshot": "base64 if requested" } ``` The interactive elements include all clickable and interactive elements on the page, with their: * `index`: Used to reference the element in other commands (click, type) * `tag`: HTML tag name (button, input, a, etc.) * `text`: Visible text content, truncated to 100 characters * `placeholder`: For input fields (if present) * `href`: For links (if present) ##### `browser_extract_content` Extract structured content from the current page using AI. ```typescript browser_extract_content(query: string, extract_links?: boolean): string ``` **Parameters:** | Parameter | Type | Required | Description | | --------------- | --------- | -------- | -------------------------------------------- | | `query` | `string` | Yes | What to extract (e.g., "all product prices") | | `extract_links` | `boolean` | No | Include links in extraction (default: false) | **Returns:** Extracted content based on query **Note:** Requires `OPENAI_API_KEY` environment variable for AI extraction. #### Tab Management Tools ##### `browser_list_tabs` List all open browser tabs. ```typescript browser_list_tabs(): string ``` **Returns:** JSON array of tab information: ```json [ { "tab_id": 'AE21', "url": "https://example.com", "title": "Page Title" } ] ``` ##### `browser_switch_tab` Switch to a specific tab. ```typescript browser_switch_tab(tab_id: string): string ``` **Parameters:** | Parameter | Type | Required | Description | | --------- | -------- | -------- | ------------------------------------------------------ | | `tab_id` | `string` | Yes | ID of tab to switch to (last 4 characters of TargetID) | **Returns:** Success message with tab URL ##### `browser_close_tab` Close a specific tab. ```typescript browser_close_tab(tab_id: string): string ``` **Parameters:** | Parameter | Type | Required | Description | | --------- | -------- | -------- | ------------------------------------------------------ | | `tab_id` | `string` | Yes | ID of the Tab to close (last 4 characters of TargetID) | **Returns:** Success message with closed tab URL ### Tool Response Format All tools return text content. Errors are returned as strings starting with "Error:". ## Configuration ### Environment Variables Configure the MCP server behavior through environment variables in Claude Desktop config: ```json { "mcpServers": { "browser-use": { "command": "python", "args": ["-m", "browser_use.mcp.server"], "env": { "OPENAI_API_KEY": "sk-..." // For AI content extraction } } } } ``` ### Browser Profile Settings The MCP server creates a browser session with these default settings: * **Downloads Path**: `~/Downloads/browser-use-mcp/` * **Wait Between Actions**: 0.5 seconds * **Keep Alive**: True (browser stays open between commands) * **Allowed Domains**: None by default (all domains allowed) ## Advanced Usage ### Running Standalone Test the MCP server without Claude Desktop: ```bash # Run server (reads from stdin, writes to stdout) uvx 'browser-use[cli]' --mcp # The server communicates via JSON-RPC on stdio ``` ### Security Considerations The MCP server provides full browser control to connected AI assistants. Consider these security measures: 1. **Domain Restrictions**: Currently not configurable via environment variables, but the server creates sessions with no domain restrictions by default 2. **File System Access**: The server creates a FileSystem instance at `~/.browser-use-mcp` for extraction operations 3. **Downloads**: Files download to `~/Downloads/browser-use-mcp/` ## Implementation Details ### Browser Session Management * **Lazy Initialization**: Browser session is created on first browser tool use * **Persistent Session**: Session remains active across multiple tool calls * **Single Session**: Currently maintains one browser session per server instance ### Tool Categories 1. **Direct Browser Control**: Tools starting with `browser_` that directly interact with the browser 2. **Agent Tasks**: Currently commented out in implementation (`browser_use_run_task`) ### Error Handling * All exceptions are caught and returned as text: `"Error: {message}"` * Browser session initialization errors are returned to the client * Missing dependencies (e.g., OPENAI\_API\_KEY) return descriptive error messages ## Troubleshooting ### Server Not Appearing in Claude 1. **Check configuration path:** * macOS: `~/Library/Application Support/Claude/claude_desktop_config.json` * Windows: `%APPDATA%\Claude\claude_desktop_config.json` 2. **Verify Python installation:** ```bash uvx 'browser-use[cli]' --version uvx 'browser-use[cli]' --mcp --help ``` 3. **Check Claude logs:** * macOS: `~/Library/Logs/Claude/mcp.log` * Windows: `%APPDATA%\Claude\logs\mcp.log` ### Browser Not Launching ```bash # Install Playwright browsers playwright install chromium # Test browser launch python -c "from browser_use import Browser; import asyncio; asyncio.run(Browser().start())" ``` ### Connection Errors If you see "MCP server connection failed": 1. Test the server directly: ```bash uvx 'browser-use[cli]' --mcp ``` 2. Check all dependencies: ```bash uv pip install "browser-use[cli]" ``` ### Content Extraction Not Working If `browser_extract_content` returns errors: 1. Ensure `OPENAI_API_KEY` is set in the environment configuration 2. Verify the API key is valid 3. Check that you have credits/access to the OpenAI API ## Limitations | Limitation | Description | Workaround | | ----------------------------- | --------------------------------------------- | -------------------------------- | | Single Browser Session | One browser instance per server | Restart server for new session | | No Domain Restrictions Config | Cannot configure allowed domains via env vars | Modify server code if needed | | No Agent Mode | `browser_use_run_task` is commented out | Use direct browser control tools | | Text-Only Responses | All responses are text strings | Parse JSON responses client-side | ## Comparison with MCP Client | Feature | MCP Server (this) | [MCP Client](/customize/mcp-client) | | ----------------- | ---------------------- | ----------------------------------- | | **Purpose** | Expose browser to AI | Connect agent to tools | | **User** | Claude Desktop, etc. | Browser-use agents | | **Direction** | External → Browser | Agent → External | | **Configuration** | JSON config file | Python code | | **Tools** | Fixed browser tools | Dynamic from server | | **Use Case** | Interactive assistance | Automated workflows | ## Code Examples * [Simple MCP client example](https://github.com/browser-use/browser-use/tree/main/examples/mcp/simple_server.py) - Basic MCP client connecting to browser-use server * [Advanced MCP client example](https://github.com/browser-use/browser-use/tree/main/examples/mcp/advanced_server.py) - Multi-server orchestration and complex workflows ## See Also * [MCP Client](/customize/mcp-client) - Connect browser-use to external MCP servers * [Model Context Protocol](https://modelcontextprotocol.io) - MCP specification * [Claude Desktop](https://claude.ai/download) - Primary MCP client # Add Tools Source: https://docs.browser-use.com/customize/tools/add Examples: * deterministic clicks * file handling * calling APIs * human-in-the-loop * browser interactions * calling LLMs * get 2fa codes * send emails * ... Simply add `@tools.action(...)` to your function. ```python from browser_use import Tools, Agent tools = Tools() @tools.action(description='Ask human for help with a question') def ask_human(question: str) -> ActionResult: answer = input(f'{question} > ') return f'The human responded with: {answer}' ``` ```python agent = Agent(task='...', llm=llm, tools=tools) ``` * **`description`** *(required)* - What the tool does, the LLM uses this to decide when to call it. * **`allowed_domains`** - List of domains where tool can run (e.g. `['*.example.com']`), defaults to all domains The Agent fills your function parameters based on their names, type hints, & defaults. ## Available Objects Your function has access to these objects: * **`browser_session: BrowserSession`** - Current browser session for CDP access * **`cdp_client`** - Direct Chrome DevTools Protocol client * **`page_extraction_llm: BaseChatModel`** - The LLM you pass into agent. This can be used to do a custom llm call here. * **`file_system: FileSystem`** - File system access * **`available_file_paths: list[str]`** - Available files for upload/processing * **`has_sensitive_data: bool`** - Whether action contains sensitive data ## Pydantic Input You can use Pydantic for the tool parameters: ```python from pydantic import BaseModel class Cars(BaseModel): name: str = Field(description='The name of the car, e.g. "Toyota Camry"') price: int = Field(description='The price of the car as int in USD, e.g. 25000') @tools.action(description='Save cars to file') def save_cars(cars: list[Cars]) -> str: with open('cars.json', 'w') as f: json.dump(cars, f) return f'Saved {len(cars)} cars to file' task = "find cars and save them to file" ``` ## Domain Restrictions Limit tools to specific domains: ```python @tools.action( description='Fill out banking forms', allowed_domains=['https://mybank.com'] ) def fill_bank_form(account_number: str) -> str: # Only works on mybank.com return f'Filled form for account {account_number}' ``` # Available Tools Source: https://docs.browser-use.com/customize/tools/available Here is the [source code](https://github.com/browser-use/browser-use/blob/main/browser_use/tools/service.py) for the default tools: ### Navigation & Browser Control * **`search_google`** - Search queries in Google * **`go_to_url`** - Navigate to URLs * **`go_back`** - Go back in browser history * **`wait`** - Wait for specified seconds ### Page Interaction * **`click_element_by_index`** - Click elements by their index * **`input_text`** - Input text into form fields * **`upload_file_to_element`** - Upload files to file inputs * **`scroll`** - Scroll the page up/down * **`scroll_to_text`** - Scroll to specific text on page * **`send_keys`** - Send special keys (Enter, Escape, etc.) ### Tab Management * **`switch_tab`** - Switch between browser tabs * **`close_tab`** - Close browser tabs ### Content Extraction * **`extract_structured_data`** - Extract data from webpages using LLM ### Form Controls * **`get_dropdown_options`** - Get dropdown option values * **`select_dropdown_option`** - Select dropdown options ### File Operations * **`write_file`** - Write content to files * **`read_file`** - Read file contents * **`replace_file_str`** - Replace text in files ### Task Completion * **`done`** - Complete the task (always available) # Basics Source: https://docs.browser-use.com/customize/tools/basics Tools are the functions that the agent has to interact with the world. ## Quick Example ```python from browser_use import Tools, ActionResult tools = Tools() @tools.action('Ask human for help with a question') def ask_human(question: str) -> ActionResult: answer = input(f'{question} > ') return f'The human responded with: {answer}' agent = Agent( task='Ask human for help', llm=llm, tools=tools, ) ``` # Remove Tools Source: https://docs.browser-use.com/customize/tools/remove You can exclude default tools: ```python from browser_use import Tools tools = Tools(exclude_actions=['search_google', 'wait']) agent = Agent(task='...', llm=llm, tools=tools) ``` # Tool Response Source: https://docs.browser-use.com/customize/tools/response Tools return results using `ActionResult` or simple strings. ## Return Types ```python @tools.action('My tool') def my_tool() -> str: return "Task completed successfully" @tools.action('Advanced tool') def advanced_tool() -> ActionResult: return ActionResult( extracted_content="Main result", long_term_memory="Remember this info", error="Something went wrong", is_done=True, success=True, attachments=["file.pdf"], ) ``` ## ActionResult Properties * `extracted_content` (default: `None`) - Main result passed to LLM, this is equivalent to returning a string. * `include_extracted_content_only_once` (default: `False`) - Set to `True` for large content to include it only once in the LLM input. * `long_term_memory` (default: `None`) - This is always included in the LLM input for all future steps. * `error` (default: `None`) - Error message, we catch exceptions and set this automatically. This is always included in the LLM input. * `is_done` (default: `False`) - Tool completes entire task * `success` (default: `None`) - Task success (only valid with `is_done=True`) * `attachments` (default: `None`) - Files to show user * `metadata` (default: `None`) - Debug/observability data ## Why `extracted_content` and `long_term_memory`? With this you control the context for the LLM. ### 1. Include short content always in context ```python def simple_tool() -> str: return "Hello, world!" # Keep in context for all future steps ``` ### 2. Show long content once, remember subset in context ```python return ActionResult( extracted_content="[500 lines of product data...]", # Shows to LLM once include_extracted_content_only_once=True, # Never show full output again long_term_memory="Found 50 products" # Only this in future steps ) ``` We save the full `extracted_content` to files which the LLM can read in future steps. ### 3. Dont show long content, remember subset in context ```python return ActionResult( extracted_content="[500 lines of product data...]", # The LLM never sees this because `long_term_memory` overrides it and `include_extracted_content_only_once` is not used long_term_memory="Saved user's favorite products", # This is shown to the LLM in future steps ) ``` ## Terminating the Agent Set `is_done=True` to stop the agent completely. Use when your tool finishes the entire task: ```python @tools.action(description='Complete the task') def finish_task() -> ActionResult: return ActionResult( extracted_content="Task completed!", is_done=True, # Stops the agent success=True # Task succeeded ) ``` # Contribution Guide Source: https://docs.browser-use.com/development/contribution-guide Learn how to contribute to Browser Use # Join the Browser Use Community! We're thrilled you're interested in contributing to Browser Use! This guide will help you get started with contributing to our project. Your contributions are what make the open-source community such an amazing place to learn, inspire, and create. ## Quick Setup Get started with Browser Use development in minutes: ```bash git clone https://github.com/browser-use/browser-use cd browser-use uv sync --all-extras --dev # or pip install -U git+https://github.com/browser-use/browser-use.git@main echo "BROWSER_USE_LOGGING_LEVEL=debug" >> .env ``` For more detailed setup instructions, see our [Local Setup Guide](/development/local-setup). ## How to Contribute ### Find Something to Work On * Browse our [GitHub Issues](https://github.com/browser-use/browser-use/issues) for beginner-friendly issues labeled `good-first-issue` * Check out our most active issues or ask in [Discord](https://discord.gg/zXJJHtJf3k) for ideas of what to work on * Get inspiration and share what you build in the [`#showcase-your-work`](https://discord.com/channels/1303749220842340412/1305549200678850642) channel * Explore or contribute to [`awesome-browser-use-prompts`](https://github.com/browser-use/awesome-prompts)! ### Making a Great Pull Request When submitting a pull request, please: * Include a clear description of what the PR does and why it's needed * Add tests that cover your changes * Include a demo screenshot/gif or an example script demonstrating your changes * Make sure the PR passes all CI checks and tests * Keep your PR focused on a single issue or feature to make it easier to review Note: We appreciate quality over quantity. Instead of submitting small typo/style-only PRs, consider including those fixes as part of larger bugfix or feature PRs. ### Contribution Process 1. Fork the repository 2. Create a new branch for your feature or bugfix 3. Make your changes 4. Run tests to ensure everything works 5. Submit a pull request 6. Respond to any feedback from maintainers 7. Celebrate your contribution! Feel free to bump your issues/PRs with comments periodically if you need faster feedback. ## Code of Conduct We're committed to providing a welcoming and inclusive environment for all contributors. Please be respectful and constructive in all interactions. ## Getting Help If you need help at any point: * Join our [Discord community](https://link.browser-use.com/discord) * Ask questions in the appropriate GitHub issue * Check our [documentation](/introduction) We're here to help you succeed in contributing to Browser Use! # Local Setup Source: https://docs.browser-use.com/development/local-setup Set up Browser Use development environment locally # Welcome to Browser Use Development! We're excited to have you join our community of contributors. This guide will help you set up your local development environment quickly and easily. ## Quick Setup If you're familiar with Python development, here's the quick way to get started: ```bash git clone https://github.com/browser-use/browser-use cd browser-use uv sync --all-extras --dev # or pip install -U git+https://github.com/browser-use/browser-use.git@main echo "BROWSER_USE_LOGGING_LEVEL=debug" >> .env ``` ## Helper Scripts We provide several convenient shell scripts in the `bin/` directory to help with common development tasks: ```bash # Complete setup script - installs uv, creates a venv, and installs dependencies ./bin/setup.sh # Run all pre-commit hooks (formatting, linting, type checking) ./bin/lint.sh # Run the core test suite that's executed in CI ./bin/test.sh ``` ## Prerequisites Browser Use requires Python 3.11 or higher. We recommend using [uv](https://docs.astral.sh/uv/) for Python environment management. ## Detailed Setup Instructions ### Clone the Repository First, clone the Browser Use repository: ```bash git clone https://github.com/browser-use/browser-use cd browser-use ``` ### Environment Setup 1. Create and activate a virtual environment: ```bash uv venv --python 3.11 source .venv/bin/activate ``` 2. Install dependencies: ```bash # Install the package in editable mode with all development dependencies uv sync --all-extras # Install the default browser playwright install chromium --with-deps --no-shell ``` ## Configuration Set up your environment variables: ```bash # Copy the example environment file cp .env.example .env ``` Or manually create a `.env` file with the API key for the models you want to use set: ```bash .env OPENAI_API_KEY=... ANTHROPIC_API_KEY= AZURE_ENDPOINT= AZURE_OPENAI_API_KEY= GOOGLE_API_KEY= DEEPSEEK_API_KEY= GROK_API_KEY= NOVITA_API_KEY= BROWSER_USE_LOGGING_LEVEL=debug # Helpful for development ``` See [Supported Models](/customize/supported-models) for available LLM options and their specific API key requirements. ## Development After setup, you can: * Try demos in the example library with `uv run examples/simple.py` * Run the linter/formatter with `uv run ruff format examples/some/file.py` * Run tests with `uv run pytest` * Build the package with `uv build` ### Linting ```bash # Run the linter on the whole project (must pass for PR to be allowed to merge) uv run pre-commit run --all-files # or use our convenience script ./bin/lint.sh # Install the linter & formatter pre-commit hooks to run automatically pre-commit install --install-hooks # Experimental: run the type checker uv run type ``` ### Tests ```bash # Run all tests that run in CI ./bin/test.sh # Run specific tests uv run pytest # run everything uv run pytest tests/test_tools.py # run a specific test file uv run pytest tests/test_sensitive_data.py tests/test_tab_management.py # run two test files uv run pytest tests/test_tab_management.py::TestTabManagement::test_user_changes_tab # run a single test ``` ### Build ```bash uv build uv pip install dist/*.whl # push build to PyPI (automatically run by Github Actions CI) uv publish ``` ## Getting Help If you run into any issues: 1. Check our [GitHub Issues](https://github.com/browser-use/browser-use/issues) 2. Join our [Discord community](https://link.browser-use.com/discord) for support We welcome contributions! See our [Contribution Guide](/development/contribution-guide) for guidelines on how to help improve Browser Use. # Observability Source: https://docs.browser-use.com/development/observability Trace Browser Use's agent execution steps and browser sessions ## Overview Browser Use has a native integration with [Laminar](https://lmnr.ai) - open-source platform for tracing, evals and labeling of AI agents. Read more about Laminar in the [Laminar docs](https://docs.lmnr.ai). ## Setup Register on [Laminar Cloud](https://lmnr.ai) and get the key from your project settings. Set the `LMNR_PROJECT_API_KEY` environment variable. ```bash pip install 'lmnr[all]' export LMNR_PROJECT_API_KEY= ``` ## Usage Then, you simply initialize the Laminar at the top of your project and both Browser Use and session recordings will be automatically traced. ```python {5-8} from browser_use import Agent, ChatOpenAI import asyncio from lmnr import Laminar, Instruments # this line auto-instruments Browser Use and any browser you use (local or remote) Laminar.initialize(project_api_key="...") async def main(): agent = Agent( task="open google, search Laminar AI", llm=ChatOpenAI(model="gpt-4.1-mini"), ) await agent.run() asyncio.run(main()) ``` ## Viewing Traces You can view traces in the Laminar UI by going to the traces tab in your project. When you select a trace, you can see both the browser session recording and the agent execution steps. Timeline of the browser session is synced with the agent execution steps, timeline highlights indicate the agent's current step synced with the browser session. In the trace view, you can also see the agent's current step, the tool it's using, and the tool's input and output. Tools are highlighted in the timeline with a yellow color. Laminar ## Laminar To learn more about tracing and evaluating your browser agents, check out the [Laminar docs](https://docs.lmnr.ai). # Telemetry Source: https://docs.browser-use.com/development/telemetry Understanding Browser Use's telemetry and privacy settings ## Overview Browser Use collects anonymous usage data to help us understand how the library is being used and to improve the user experience. It also helps us fix bugs faster and prioritize feature development. ## Data Collection We use [PostHog](https://posthog.com) for telemetry collection. The data is completely anonymized and contains no personally identifiable information. We never collect personal information, credentials, or specific content from your browser automation tasks. ## Opting Out You can disable telemetry by setting an environment variable: ```bash .env ANONYMIZED_TELEMETRY=false ``` Or in your Python code: ```python import os os.environ["ANONYMIZED_TELEMETRY"] = "false" ``` Even when enabled, telemetry has zero impact on the library's performance or functionality. Code is available in [Telemetry Service](https://github.com/browser-use/browser-use/tree/main/browser_use/telemetry). # Introduction Source: https://docs.browser-use.com/introduction Automate browser tasks in plain text. Browser Use Logo Browser Use Logo Open-source Python library. Scale up with our cloud. # Human Quickstart Source: https://docs.browser-use.com/quickstart ## 1. Easy setup Use [uv](https://docs.astral.sh/uv/) to create and activate the environment: ```bash uv venv --python 3.12 ``` ```bash # For Mac/Linux: source .venv/bin/activate # For Windows: .venv\Scripts\activate ``` Install browser-use: ```bash uv pip install browser-use ``` Install Chromium: ```bash uvx playwright install chromium --with-deps ``` ## 2. Choose your favorite LLM Create a `.env` file and add your API key: ```bash .env OPENAI_API_KEY= ``` See [Supported Models](/customize/supported-models) for other models. ## 3. Run your first agent ```python agent.py from browser_use import Agent, ChatOpenAI from dotenv import load_dotenv import asyncio load_dotenv() async def main(): llm = ChatOpenAI(model="gpt-4.1-mini") task = "Find the number 1 post on Show HN" agent = Agent(task=task, llm=llm) await agent.run() if __name__ == "__main__": asyncio.run(main()) ``` # LLM Quickstart Source: https://docs.browser-use.com/quickstart_llm 1. Copy all content [🔗 from here](https://docs.browser-use.com/llms-full.txt) (\~40k tokens) 2. Paste it into your favorite coding agent (Cursor, Claude, ChatGPT ...).