# Check Balance
Source: https://docs.browser-use.com/api-reference/api-v1/check-balance
https://api.browser-use.com/api/v1/openapi.json get /balance
Returns the user's current API credit balance, which includes both monthly subscription
credits and any additional purchased credits. Required for monitoring usage and ensuring sufficient
credits for task execution.
# Create Browser Profile
Source: https://docs.browser-use.com/api-reference/api-v1/create-browser-profile
https://api.browser-use.com/api/v1/openapi.json post /browser-profiles
Create a new browser profile with custom settings for ad blocking, proxy usage, and viewport dimensions.
Pay as you go users can only have one profile. Subscription users can create multiple profiles.
# Create Scheduled Task
Source: https://docs.browser-use.com/api-reference/api-v1/create-scheduled-task
https://api.browser-use.com/api/v1/openapi.json post /scheduled-task
Create a scheduled task to run at regular intervals or based on a cron expression.
Requires an active subscription. Returns the scheduled task ID.
# Delete Browser Profile
Source: https://docs.browser-use.com/api-reference/api-v1/delete-browser-profile
https://api.browser-use.com/api/v1/openapi.json delete /browser-profiles/{profile_id}
Deletes a browser profile. This will remove the profile and all associated browser data.
# Delete Scheduled Task
Source: https://docs.browser-use.com/api-reference/api-v1/delete-scheduled-task
https://api.browser-use.com/api/v1/openapi.json delete /scheduled-task/{task_id}
Deletes a scheduled task. This will prevent any future runs of this task.
Any currently running instances of this task will be allowed to complete.
# Get Browser Profile
Source: https://docs.browser-use.com/api-reference/api-v1/get-browser-profile
https://api.browser-use.com/api/v1/openapi.json get /browser-profiles/{profile_id}
Returns information about a specific browser profile and its configuration settings.
# Get Browser Use Version
Source: https://docs.browser-use.com/api-reference/api-v1/get-browser-use-version
https://api.browser-use.com/api/v1/openapi.json get /browser-use-version
Returns the browser-use Python library version used by the backend.
# Get Scheduled Task
Source: https://docs.browser-use.com/api-reference/api-v1/get-scheduled-task
https://api.browser-use.com/api/v1/openapi.json get /scheduled-task/{task_id}
Returns detailed information about a specific scheduled task, including its schedule configuration
and current status.
# Get Task
Source: https://docs.browser-use.com/api-reference/api-v1/get-task
https://api.browser-use.com/api/v1/openapi.json get /task/{task_id}
Returns comprehensive information about a task, including its current status, steps completed, output (if finished), and other metadata.
# Get Task Gif
Source: https://docs.browser-use.com/api-reference/api-v1/get-task-gif
https://api.browser-use.com/api/v1/openapi.json get /task/{task_id}/gif
Returns a gif url generated from the screenshots of the task execution.
Only available for completed tasks that have screenshots.
# Get Task Media
Source: https://docs.browser-use.com/api-reference/api-v1/get-task-media
https://api.browser-use.com/api/v1/openapi.json get /task/{task_id}/media
Returns links to any recordings or media generated during task execution,
such as browser session recordings. Only available for completed tasks.
# Get Task Output File
Source: https://docs.browser-use.com/api-reference/api-v1/get-task-output-file
https://api.browser-use.com/api/v1/openapi.json get /task/{task_id}/output-file/{file_name}
Returns a presigned url for downloading a file from the task output files.
# Get Task Screenshots
Source: https://docs.browser-use.com/api-reference/api-v1/get-task-screenshots
https://api.browser-use.com/api/v1/openapi.json get /task/{task_id}/screenshots
Returns any screenshot urls generated during task execution.
# Get Task Status
Source: https://docs.browser-use.com/api-reference/api-v1/get-task-status
https://api.browser-use.com/api/v1/openapi.json get /task/{task_id}/status
Returns just the current status of a task (created, running, finished, stopped, or paused).
More lightweight than the full task details endpoint.
# List Browser Profiles
Source: https://docs.browser-use.com/api-reference/api-v1/list-browser-profiles
https://api.browser-use.com/api/v1/openapi.json get /browser-profiles
Returns a paginated list of all browser profiles belonging to the user, ordered by creation date.
Each profile includes configuration like ad blocker settings, proxy settings, and viewport dimensions.
# List Scheduled Tasks
Source: https://docs.browser-use.com/api-reference/api-v1/list-scheduled-tasks
https://api.browser-use.com/api/v1/openapi.json get /scheduled-tasks
Returns a paginated list of all scheduled tasks belonging to the user, ordered by creation date.
Each task includes basic information like schedule type, next run time, and status.
# List Tasks
Source: https://docs.browser-use.com/api-reference/api-v1/list-tasks
https://api.browser-use.com/api/v1/openapi.json get /tasks
Returns a paginated list of all tasks belonging to the user, ordered by creation date.
Each task includes basic information like status and creation time. For detailed task info, use the
get task endpoint.
# Me
Source: https://docs.browser-use.com/api-reference/api-v1/me
https://api.browser-use.com/api/v1/openapi.json get /me
Returns a boolean value indicating if the API key is valid and the user is authenticated.
# Pause Task
Source: https://docs.browser-use.com/api-reference/api-v1/pause-task
https://api.browser-use.com/api/v1/openapi.json put /pause-task
Pauses execution of a running task. The task can be resumed later using the `/resume-task` endpoint. Useful for manual intervention or inspection.
# Ping
Source: https://docs.browser-use.com/api-reference/api-v1/ping
https://api.browser-use.com/api/v1/openapi.json get /ping
Use this endpoint to check if the server is running and responding.
# Resume Task
Source: https://docs.browser-use.com/api-reference/api-v1/resume-task
https://api.browser-use.com/api/v1/openapi.json put /resume-task
Resumes execution of a previously paused task. The task will continue from where it was paused. You can't resume a stopped task.
# Run Task
Source: https://docs.browser-use.com/api-reference/api-v1/run-task
https://api.browser-use.com/api/v1/openapi.json post /run-task
Requires an active subscription. Returns the task ID that can be used to track progress.
# Search Url
Source: https://docs.browser-use.com/api-reference/api-v1/search-url
https://api.browser-use.com/api/v1/openapi.json post /search-url
Search a single URL using browser use.
# Simple Search
Source: https://docs.browser-use.com/api-reference/api-v1/simple-search
https://api.browser-use.com/api/v1/openapi.json post /simple-search
Search the internet using browser use.
# Stop Task
Source: https://docs.browser-use.com/api-reference/api-v1/stop-task
https://api.browser-use.com/api/v1/openapi.json put /stop-task
Stops a running browser automation task immediately. The task cannot be resumed after being stopped.
Use `/pause-task` endpoint instead if you want to temporarily halt execution.
# Update Browser Profile
Source: https://docs.browser-use.com/api-reference/api-v1/update-browser-profile
https://api.browser-use.com/api/v1/openapi.json put /browser-profiles/{profile_id}
Update a browser profile with partial updates. Only the fields you want to change need to be included.
# Update Scheduled Task
Source: https://docs.browser-use.com/api-reference/api-v1/update-scheduled-task
https://api.browser-use.com/api/v1/openapi.json put /scheduled-task/{task_id}
Update a scheduled task with partial updates.
# Upload File Presigned Url
Source: https://docs.browser-use.com/api-reference/api-v1/upload-file-presigned-url
https://api.browser-use.com/api/v1/openapi.json post /uploads/presigned-url
Returns a presigned url for uploading a file to the user's files bucket.
After uploading a file, the user can use the `included_file_names` field
in the `RunTaskRequest` to include the files in the task.
# Authentication
Source: https://docs.browser-use.com/cloud/v1/authentication
Learn how to authenticate with the Browser Use Cloud API
The Browser Use Cloud API uses API keys to authenticate requests. You can obtain an API key from your [Browser Use Cloud dashboard](https://cloud.browser-use.com/settings/api-keys).
## API Keys
All API requests must include your API key in the `Authorization` header:
```bash
Authorization: Bearer YOUR_API_KEY
```
Keep your API keys secure and do not share them in publicly accessible areas such as GitHub, client-side code, or in your browser's developer tools. API keys should be stored securely in environment variables or a secure key management system.
## Example Request
Here's an example of how to include your API key in a request using Python:
```python
import requests
API_KEY = 'your_api_key_here'
BASE_URL = 'https://api.browser-use.com/api/v1'
HEADERS = {'Authorization': f'Bearer {API_KEY}'}
response = requests.get(f'{BASE_URL}/me', headers=HEADERS)
print(response.json())
```
## Verifying Authentication
You can verify that your API key is valid by making a request to the `/api/v1/me` endpoint. See the [Me endpoint documentation](/api-reference/api-v1/me) for more details.
## API Key Security
To ensure the security of your API keys:
1. **Never share your API key** in publicly accessible areas
2. **Rotate your API keys** periodically
3. **Use environment variables** to store API keys in your applications
4. **Implement proper access controls** for your API keys
5. **Monitor API key usage** for suspicious activity
If you believe your API key has been compromised, you should immediately revoke it and generate a new one from your Browser Use Cloud dashboard.
# Cloud SDK
Source: https://docs.browser-use.com/cloud/v1/custom-sdk
Learn how to set up your own Browser Use Cloud SDK
This guide walks you through setting up your own Browser Use Cloud SDK.
## Building your own client (OpenAPI)
This approach is recommended **only** if you need to run simple tasks and
**don’t require fine-grained control**.
The best way to build your own client is to use our [OpenAPI specification](http://api.browser-use.com/openapi.json) to generate a type-safe client library.
### Python
Use [openapi-python-client](https://github.com/openapi-generators/openapi-python-client) to generate a modern Python client:
```bash
# Install the generator
pipx install openapi-python-client --include-deps
# Generate the client
openapi-python-client generate --url http://api.browser-use.com/openapi.json
```
This will create a Python package with full type hints, modern dataclasses, and async support.
### TypeScript/JavaScript
Use [OpenAPI TS](https://openapi-ts.dev/) library to generate a type safe TypeScript client for the Browser Use API.
The following guide shows how to create a simple type-safe `fetch` client, but you can also use other generators.
* React Query - [https://openapi-ts.dev/openapi-react-query/](https://openapi-ts.dev/openapi-react-query/)
* SWR - [https://openapi-ts.dev/swr-openapi/](https://openapi-ts.dev/swr-openapi/)
```bash npm
npm install openapi-fetch
npm install -D openapi-typescript typescript
```
```bash yarn
yarn add openapi-fetch
yarn add -D openapi-typescript typescript
```
```bash pnpm
pnpm add openapi-fetch
pnpm add -D openapi-typescript typescript
```
```json title="package.json"
{
"scripts": {
"openapi:gen": "openapi-typescript https://api.browser-use.com/openapi.json -o ./src/lib/api/v1.d.ts"
}
}
```
```bash
pnpm openapi:gen
```
```ts
// client.ts
'use client'
import createClient from 'openapi-fetch'
import { paths } from '@/lib/api/v1'
export type Client = ReturnType>
export const client = createClient({
baseUrl: 'https://api.browser-use.com/',
// NOTE: You can get your API key from https://cloud.browser-use.com/billing!
headers: { Authorization: `Bearer ${apiKey}` },
})
```
Need help? Contact our support team at [support@browser-use.com](mailto:support@browser-use.com) or join our
[Discord community](https://link.browser-use.com/discord)
# V1 Implementation
Source: https://docs.browser-use.com/cloud/v1/implementation
Learn how to implement the Browser Use API in Python
This guide shows how to implement common API patterns using Python. We'll create a complete example that creates and monitors a browser automation task.
## Basic Implementation
For all settings see [Run Task](/api-reference/api-v1/run-task).
Here's a simple implementation using Python's `requests` library to stream the task steps:
```python
import json
import time
import requests
API_KEY = 'your_api_key_here'
BASE_URL = 'https://api.browser-use.com/api/v1'
HEADERS = {'Authorization': f'Bearer {API_KEY}'}
def create_task(instructions: str):
"""Create a new browser automation task"""
response = requests.post(f'{BASE_URL}/run-task', headers=HEADERS, json={'task': instructions})
return response.json()['id']
def get_task_status(task_id: str):
"""Get current task status"""
response = requests.get(f'{BASE_URL}/task/{task_id}/status', headers=HEADERS)
return response.json()
def get_task_details(task_id: str):
"""Get full task details including output"""
response = requests.get(f'{BASE_URL}/task/{task_id}', headers=HEADERS)
return response.json()
def wait_for_completion(task_id: str, poll_interval: int = 2):
"""Poll task status until completion"""
count = 0
unique_steps = []
while True:
details = get_task_details(task_id)
new_steps = details['steps']
# use only the new steps that are not in unique_steps.
if new_steps != unique_steps:
for step in new_steps:
if step not in unique_steps:
print(json.dumps(step, indent=4))
unique_steps = new_steps
count += 1
status = details['status']
if status in ['finished', 'failed', 'stopped']:
return details
time.sleep(poll_interval)
def main():
task_id = create_task('Open https://www.google.com and search for openai')
print(f'Task created with ID: {task_id}')
task_details = wait_for_completion(task_id)
print(f"Final output: {task_details['output']}")
if __name__ == '__main__':
main()
```
## Task Control Example
Here's how to implement task control with pause/resume functionality:
```python
def control_task():
# Create a new task
task_id = create_task("Go to google.com and search for Browser Use")
# Wait for 5 seconds
time.sleep(5)
# Pause the task
requests.put(f"{BASE_URL}/pause-task?task_id={task_id}", headers=HEADERS)
print("Task paused! Check the live preview.")
# Wait for user input
input("Press Enter to resume...")
# Resume the task
requests.put(f"{BASE_URL}/resume-task?task_id={task_id}", headers=HEADERS)
# Wait for completion
result = wait_for_completion(task_id)
print(f"Task completed with output: {result['output']}")
```
## Structured Output Example
Here's how to implement a task with structured JSON output:
```python
import json
import os
import time
import requests
from pydantic import BaseModel
from typing import List
API_KEY = os.getenv("API_KEY")
BASE_URL = 'https://api.browser-use.com/api/v1'
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
# Define output schema using Pydantic
class SocialMediaCompany(BaseModel):
name: str
market_cap: float
headquarters: str
founded_year: int
class SocialMediaCompanies(BaseModel):
companies: List[SocialMediaCompany]
def create_structured_task(instructions: str, schema: dict):
"""Create a task that expects structured output"""
payload = {
"task": instructions,
"structured_output_json": json.dumps(schema)
}
response = requests.post(f"{BASE_URL}/run-task", headers=HEADERS, json=payload)
response.raise_for_status()
return response.json()["id"]
def wait_for_task_completion(task_id: str, poll_interval: int = 5):
"""Poll task status until it completes"""
while True:
response = requests.get(f"{BASE_URL}/task/{task_id}/status", headers=HEADERS)
response.raise_for_status()
status = response.json()
if status == "finished":
break
elif status in ["failed", "stopped"]:
raise RuntimeError(f"Task {task_id} ended with status: {status}")
print("Waiting for task to finish...")
time.sleep(poll_interval)
def fetch_task_output(task_id: str):
"""Retrieve the final task result"""
response = requests.get(f"{BASE_URL}/task/{task_id}", headers=HEADERS)
response.raise_for_status()
return response.json()["output"]
def main():
schema = SocialMediaCompanies.model_json_schema()
task_id = create_structured_task(
"Get me the top social media companies by market cap",
schema
)
print(f"Task created with ID: {task_id}")
wait_for_task_completion(task_id)
print("Task completed!")
output = fetch_task_output(task_id)
print("Raw output:", output)
try:
parsed = SocialMediaCompanies.model_validate_json(output)
print("Parsed output:")
print(parsed)
except Exception as e:
print(f"Failed to parse structured output: {e}")
if __name__ == "__main__":
main()
```
Remember to handle your API key securely and implement proper error handling
in production code.
# N8N + Browser Use Cloud
Source: https://docs.browser-use.com/cloud/v1/n8n-browser-use-integration
Learn how to integrate Browser Use Cloud API with n8n using a practical workflow example (competitor research).
> **TL;DR** – In **3 minutes** you can have an n8n workflow that:
>
> 1. Shows a form asking for a competitor’s name
> 2. Starts a Browser Use task that crawls the web and extracts **pricing, jobs, new features & announcements**
> 3. Waits for the task to finish via a **webhook**
> 4. Formats the output and drops a rich message into Slack
You can grab the workflow JSON below – copy it and import it into n8n, plug in your API keys and hit *Execute* 🚀.
***
## Why use Browser Use in n8n?
• **Autonomous browsing** – Browser Use opens pages like a real user, follows links, clicks buttons and reads DOM content.
• **Structured output** – You tell the agent *exactly* which fields you need. No brittle regex or XPaths.
• **Scales effortlessly** – Kick off hundreds of tasks and monitor them through the Cloud API.
n8n glues everything together so your team gets the data instantly—no Python scripts or CRON jobs needed.
***
## Prerequisites
1. **Browser Use Cloud API key** – grab one from your [Billing page](https://cloud.browser-use.com/billing).
2. **n8n instance** – self-hosted or n8n.cloud. (The screenshots below use n8n 1.45+.)
3. **Slack Incoming Webhook URL** – create one in your Slack workspace.
Add both secrets to n8n’s credential manager:
```env title=".env example"
BROWSER_USE_API_KEY="sk-…"
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/…"
```
***
## Import the template
1. Copy the [workflow JSON](#workflow-json) below to your clipboard.
2. In n8n create a new workflow and paste the JSON.
3. Replace the *Browser-Use API Key* credential and *Slack Incoming Webhook URL* with yours.
***
## How the workflow works
### 1. `Form Trigger` – collect the competitor’s name
A public n8n form with a single required field. When a user submits, the workflow fires instantly.
### 2. `HTTP Request – Browser Use Run Task`
We POST to `/api/v1/run-task` with the following body:
```json title="run-task payload"
{
"task": "Do exhaustive research on {{ $json[\"Competitor Name\"] }} and extract all pricing information, job postings, new features and announcements",
"save_browser_data": true,
"structured_output_json": {
"pricing": {
"plans": ["string"],
"prices": ["string"],
"features": ["string"]
},
"jobs": {
"titles": ["string"],
"departments": ["string"],
"locations": ["string"]
},
"new_features": { "titles": ["string"], "description": ["string"] },
"announcements": { "titles": ["string"], "description": ["string"] }
},
"metadata": { "source": "n8n-competitor-demo" }
}
```
Important bits:
• `structured_output_json` tells the agent which keys to return – no post-processing required.
• We tag the task with `metadata.source` so the webhook can filter only *our* jobs.
### 3. `Webhook` + `IF` – wait for task completion
Browser Use sends a webhook when anything happens to a task (see our [Webhooks guide](/cloud/v1/webhooks) for setup details). We expose an n8n Webhook node at `/get-research-data` and let the agent call it.
We only proceed when **both** conditions are true:
* `payload.status == "finished"`
* `payload.metadata.source == "n8n-competitor-demo"`
### 4. `Get Task Details`
The webhook body includes the `session_id`. We fetch the full task record so we get the `output` field containing the structured JSON from step 2.
### 5. `Code – Generate Slack message`
A short JS snippet turns the JSON into a nicely-formatted Slack block with emojis and bullet points. Feel free to tweak the formatting.
### 6. `HTTP Request – Send to Slack`
Finally we POST the message to your incoming webhook and celebrate 🎉.
***
## Customize as you want
This workflow is just the starting point – Browser Use + n8n gives you endless possibilities. Here are some ideas:
| Want to... | How to do it |
| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| **Extract different data** | Edit `structured_output_json` to specify exactly what fields you need (pricing, reviews, contact info, etc.) and adjust the JS formatter. |
| **Send to Teams/Email/Notion** | Swap the last Slack node for Teams, Gmail, or any of n8n's 400+ connectors. |
| **Run automatically** | Replace the Form trigger with a Cron trigger for daily/weekly competitor monitoring. |
| **Monitor multiple competitors** | Use a Google Sheets trigger with a list of companies and loop through them. |
| **Add AI analysis** | Pipe the extracted data through OpenAI/Claude to generate insights and summaries. |
| **Create alerts** | Set up conditional logic to only notify when competitors announce new features or price changes. |
| **Build a dashboard** | Send data to Airtable, Notion, or Google Sheets to build a real-time competitor intelligence dashboard. |
The beauty of Browser Use is that it handles the complex web browsing while you focus on building the perfect workflow for your needs.
***
## Workflow JSON
```json id="workflow-json"
{
"name": "Competitor Intelligence Workflow with webhooks",
"nodes": [
{
"parameters": {
"httpMethod": "POST",
"path": "get-research-data",
"options": {}
},
"type": "n8n-nodes-base.webhook",
"typeVersion": 2,
"position": [
-480,
176
],
"id": "81166dab-eb91-4627-b773-1aa7f7bd86ee",
"name": "Webhook",
"webhookId": "025bc4bf-00c0-47d4-bd5f-79046674d017"
},
{
"parameters": {
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"id": "8d9701b6-1dc2-4e55-9fe4-ef1735ff1ebc",
"leftValue": "={{ $json.body.payload.status }}",
"rightValue": "finished",
"operator": {
"type": "string",
"operation": "equals",
"name": "filter.operator.equals"
}
},
{
"id": "7cf18a23-f3d8-4a70-a77c-c286a231fc7f",
"leftValue": "={{ $json.body.payload.metadata.source }}",
"rightValue": "n8n-competitor-demo",
"operator": {
"type": "string",
"operation": "equals",
"name": "filter.operator.equals"
}
}
],
"combinator": "and"
},
"options": {}
},
"type": "n8n-nodes-base.if",
"typeVersion": 2.2,
"position": [
-256,
176
],
"id": "b38737cc-0b8a-4a76-930f-362eb5de9ef9",
"name": "If"
},
{
"parameters": {
"formTitle": "Run Competitor Analysis",
"formFields": {
"values": [
{
"fieldLabel": "Competitor Name",
"placeholder": "(e.g. OpenAI)",
"requiredField": true
}
]
},
"options": {}
},
"type": "n8n-nodes-base.formTrigger",
"typeVersion": 2.2,
"position": [
-336,
-64
],
"id": "fcfc33dd-7d8a-460b-838d-955c65416aea",
"name": "On form submission",
"webhookId": "b2712d5b-14ae-424b-8733-fe6e77cebd43"
},
{
"parameters": {
"method": "POST",
"url": "https://api.browser-use.com/api/v1/run-task",
"authentication": "genericCredentialType",
"genericAuthType": "httpBearerAuth",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{}
]
},
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={\n \"task\": \"Do exhaustive research on {{ $json['Competitor Name'] }} and extract all pricing information, job postings, new features and announcements\",\n \"save_browser_data\": true,\n \"structured_output_json\": \"{\\n \\\"pricing\\\": {\\n \\\"plans\\\": [\\\"string\\\"],\\n \\\"prices\\\": [\\\"string\\\"],\\n \\\"features\\\": [\\\"string\\\"]\\n },\\n \\\"jobs\\\": {\\n \\\"titles\\\": [\\\"string\\\"],\\n \\\"departments\\\": [\\\"string\\\"],\\n \\\"locations\\\": [\\\"string\\\"]\\n },\\n \\\"new_features\\\": {\\n \\\"titles\\\": [\\\"string\\\"],\\n \\\"description\\\": [\\\"string\\\"]\\n },\\n \\\"announcements\\\": {\\n \\\"titles\\\": [\\\"string\\\"],\\n \\\"description\\\": [\\\"string\\\"]\\n }\\n}\",\n\"metadata\": {\"source\": \"n8n-competitor-demo\"}\n} ",
"options": {}
},
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [
-112,
-64
],
"id": "d10bef40-e2a3-41ff-a507-4f365c13dc52",
"name": "BrowserUse Run Task",
"credentials": {
"httpBearerAuth": {
"id": "peg6MzgmJNRMCMnT",
"name": "Browser-Use API Key"
}
}
},
{
"parameters": {
"url": "=https://api.browser-use.com/api/v1/task/{{ $('Webhook').item.json.body.payload.session_id }}",
"authentication": "genericCredentialType",
"genericAuthType": "httpBearerAuth",
"options": {}
},
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [
0,
144
],
"id": "e49c28ff-11a2-4195-94ab-ca5796572c34",
"name": "Get Task details",
"credentials": {
"httpBearerAuth": {
"id": "peg6MzgmJNRMCMnT",
"name": "Browser-Use API Key"
}
}
},
{
"parameters": {
"jsCode": "const output_data = $input.first().json.output;\nconst data = JSON.parse(output_data);\n\nconst pricing = data?.pricing;\nconst jobs = data?.jobs;\nconst newFeatures = data?.new_features;\nconst announcements = data?.announcements;\n\n// Helper function to format arrays as bullet points\nconst formatAsBullets = (arr, prefix = \"• \" => {\n if (!arr || arr.length === 0) return \"• N/A\";\n return arr.map(item => `${prefix}${item}`).join(\"\\n\");\n};\n\nreturn {\n text: `🏷️ *Pricing*\\nPlans:\\n${formatAsBullets(pricing?.plans)}\\n\\nPrices:\\n${formatAsBullets(pricing?.prices)}\\n\\nFeatures:\\n${formatAsBullets(pricing?.features)}\\n\\n💼 *Jobs*\\nTitles:\\n${formatAsBullets(jobs?.titles)}\\n\\nDepartments:\\n${formatAsBullets(jobs?.departments)}\\n\\nLocations:\\n${formatAsBullets(jobs?.locations)}\\n\\n✨ *New Features*\\nTitles:\\n${formatAsBullets(newFeatures?.titles)}\\n\\nDescription:\\n${formatAsBullets(newFeatures?.description)}\\n\\n📢 *Announcements*\\n${formatAsBullets(announcements?.description)}`\n};"
},
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [
208,
144
],
"id": "54bc087d-237d-438a-b688-bcbec25d9c45",
"name": "Generate Slack message"
},
{
"parameters": {
"method": "POST",
"url": "",
"sendBody": true,
"bodyParameters": {
"parameters": [
{
"name": "text",
"value": "={{ $json.text }}"
}
]
},
"options": {}
},
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [
432,
144
],
"id": "969a16f0-677b-4e46-a8bb-57a80b5daf07",
"name": "Send to Slack"
}
],
"pinData": {},
"connections": {
"Webhook": {
"main": [
[
{
"node": "If",
"type": "main",
"index": 0
}
]
]
},
"If": {
"main": [
[
{
"node": "Get Task details",
"type": "main",
"index": 0
}
]
]
},
"On form submission": {
"main": [
[
{
"node": "BrowserUse Run Task",
"type": "main",
"index": 0
}
]
]
},
"Get Task details": {
"main": [
[
{
"node": "Generate Slack message",
"type": "main",
"index": 0
}
]
]
},
"Generate Slack message": {
"main": [
[
{
"node": "Send to Slack",
"type": "main",
"index": 0
}
]
]
}
},
"active": true,
"settings": {
"executionOrder": "v1"
},
"versionId": "f3b38678-4821-41ad-952c-df9bbba40fc8",
"meta": {
"templateCredsSetupCompleted": true,
"instanceId": "7a1d1fd830bae2a00010153cf810fd67e0c87b8ae64ceb62273c87183efda365"
},
"id": "qmhqkZH8DhISWMmc",
"tags": []
}
```
Copy everything between the braces, import into n8n and you're good to go.
Having trouble? Ping us in the #integrations channel on
[Discord](https://link.browser-use.com/discord) – we’re happy to help.
# Pricing
Source: https://docs.browser-use.com/cloud/v1/pricing
Browser Use Cloud API pricing structure and cost breakdown
The Browser Use Cloud API pricing consists of two components:
1. **Task Initialization Cost**: \$0.01 per started task
2. **Task Step Cost**: Additional cost based on the specific model used for each step
## LLM Model Step Pricing
The following table shows the total cost per step for each available LLM model:
| Model | Cost per Step |
| -------------------------------- | ------------- |
| GPT-4o | \$0.03 |
| GPT-4o mini | \$0.01 |
| GPT-4.1 | \$0.03 |
| GPT-4.1 mini | \$0.01 |
| O4 mini | \$0.02 |
| O3 | \$0.03 |
| Gemini 2.0 Flash | \$0.01 |
| Gemini 2.0 Flash Lite | \$0.01 |
| Gemini 2.5 Flash Preview (04/17) | \$0.01 |
| Gemini 2.5 Flash | \$0.01 |
| Gemini 2.5 Pro | \$0.03 |
| Claude 3.7 Sonnet (2025-02-19) | \$0.03 |
| Claude Sonnet 4 (2025-05-14) | \$0.03 |
| Llama 4 Maverick 17B Instruct | \$0.01 |
## Example Cost Calculation
For example, using GPT-4.1 for a 10 step task:
* Task initialization: \$0.01
* 10 steps x \$0.03 per step = \$0.30
* **Total cost: \$0.31**
# Quickstart
Source: https://docs.browser-use.com/cloud/v1/quickstart
Learn how to get started with the Browser Use Cloud API
You need an active subscription and an API key from
[cloud.browser-use.com/billing](https://cloud.browser-use.com/billing). For
detailed pricing information, see our [pricing page](/cloud/v1/pricing).
## Creating Your First Agent
To understand how the API works visit the [Run Task](/api-reference/api-v1/run-task?playground=open) page.
```bash
curl -X POST https://api.browser-use.com/api/v1/run-task \
-H "Authorization: Bearer your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"task": "Go to google.com and search for Browser Use"
}'
```
`run-task` API returns a task ID, which you can query to get the task status, live preview URL, and the result output.
To play around with the API, you can use the [Browser Use Cloud
Playground](https://cloud.browser-use.com/playground).
For the full implementation guide see the [Implementation](/cloud/v1/implementation) page.
# Search API
Source: https://docs.browser-use.com/cloud/v1/search
Get started with Browser Use's search endpoints to extract content from websites
**🧪 BETA - This API is in beta - it may change and might not be available at
all times.**
## Why Browser Use Over Traditional Search?
**Browser Use actually browses websites like a human** while other tools return cached data from landing pages. Browser Use navigates deep into sites in real-time:
* 🔍 **Deep navigation**: Clicks through menus, forms, and multiple pages to find buried content
* 🚀 **Always current**: Live prices, breaking news, real-time analytics - not cached results
* 🎯 **No stale data**: See exactly what's on the page right now
* 🌐 **Dynamic content**: Handles JavaScript, forms, and interactive elements
* 🏠 **No surface limitations**: Gets data from pages that require navigation or interaction
**Other tools see yesterday's front door. Browser Use explores today's whole house.**
## Quick Start
The Search API allows you to quickly extract relevant content from websites using AI. There are two main endpoints:
💡 **Complete working examples** are available in the [examples/search](https://github.com/browser-use/browser-use/tree/main/examples/search) folder.
### Simple Search
Search Google and extract content from multiple top results:
```python
import aiohttp
import asyncio
async def simple_search():
payload = {
"query": "latest AI news",
"max_websites": 5,
"depth": 2
}
headers = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}
async with aiohttp.ClientSession() as session:
async with session.post(
"https://api.browser-use.com/api/v1/simple-search",
json=payload,
headers=headers
) as response:
result = await response.json()
return result
asyncio.run(simple_search())
```
### Search URL
Extract content from a specific URL:
```python
async def search_url():
payload = {
"url": "https://browser-use.com/#pricing",
"query": "Find pricing information for Browser Use",
"depth": 2
}
headers = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}
async with aiohttp.ClientSession() as session:
async with session.post(
"https://api.browser-use.com/api/v1/search-url",
json=payload,
headers=headers
) as response:
result = await response.json()
return result
asyncio.run(search_url())
```
## Parameters
* **query**: Search query or content to extract
* **depth**: How deep to navigate within each website (2-5, default: 2)
* `depth=2`: Checks main page + 1 click deeper
* `depth=3`: Checks main page + 2 clicks deeper
* `depth=5`: Thoroughly explores multiple navigation levels
* **max\_websites**: Number of websites to process (simple-search only, default: 5)
* **url**: Target URL to extract from (search-url only)
## Pricing
### Simple Search
**Cost per request**: `1 cent × depth × max_websites`
Example: depth=2, max\_websites=3 = 6 cents per request
### Search URL
**Cost per request**: `1 cent × depth`
Example: depth=2 = 2 cents per request
# Webhooks
Source: https://docs.browser-use.com/cloud/v1/webhooks
Learn how to integrate webhooks with Browser Use Cloud API
Webhooks allow you to receive real-time notifications about events in your Browser Use tasks. This guide will show you how to set up and verify webhook endpoints.
## Prerequisites
You need an active subscription to create webhooks. See your billing page
[cloud.browser-use.com/billing](https://cloud.browser-use.com/billing)
## Setting Up Webhooks
To receive webhook notifications, you need to:
1. Create an endpoint that can receive HTTPS POST requests
2. Configure your webhook URL in the Browser Use dashboard
3. Implement signature verification to ensure webhook authenticity
When adding a webhook URL in the dashboard, it must be a valid HTTPS URL that can receive POST requests.
On creation, we will send a test payload `{"type": "test", "timestamp": "2024-03-21T12:00:00Z", "payload": {"test": "ok"}}` to verify the endpoint is working correctly before creating the actual webhook!
## Webhook Events
Browser Use sends various types of events. Each event has a specific type and payload structure.
### Event Types
Currently supported events:
| Event Type | Description |
| -------------------------- | -------------------------------- |
| `agent.task.status_update` | Status updates for running tasks |
### Task Status Updates
The `agent.task.status_update` event includes the following statuses:
| Status | Description |
| -------------- | -------------------------------------- |
| `initializing` | A task is initializing |
| `started` | A Task has started (browser available) |
| `paused` | A task has been paused mid execution |
| `stopped` | A task has been stopped mid execution |
| `finished` | A task has finished |
## Webhook Payload Structure
Each webhook call includes:
* A JSON payload with event details
* `X-Browser-Use-Timestamp` header with the current timestamp
* `X-Browser-Use-Signature` header for verification
The payload follows this structure:
```json
{
"type": "agent.task.status_update",
"timestamp": "2025-05-25T09:22:22.269116+00:00",
"payload": {
"session_id": "cd9cc7bf-e3af-4181-80a2-73f083bc94b4",
"task_id": "5b73fb3f-a3cb-4912-be40-17ce9e9e1a45",
"status": "finished",
"metadata": {
"campaign": "q4-automation",
"team": "marketing"
}
}
}
```
The webhook payload now includes a `metadata` field containing any custom key-value pairs that were provided when the task was created. This allows you to correlate webhook events with your internal tracking systems.
## Implementing Webhook Verification
To ensure webhook authenticity, you must verify the signature. Here's an example implementation in Python using FastAPI:
```python
import uvicorn
import hmac
import hashlib
import json
import os
from fastapi import FastAPI, Request, HTTPException
app = FastAPI()
SECRET_KEY = os.environ['SECRET_KEY']
def verify_signature(payload: dict, timestamp: str, received_signature: str) -> bool:
message = f'{timestamp}.{json.dumps(payload, separators=(",", ":"), sort_keys=True)}'
expected_signature = hmac.new(SECRET_KEY.encode(), message.encode(), hashlib.sha256).hexdigest()
return hmac.compare_digest(expected_signature, received_signature)
@app.post('/webhook')
async def webhook(request: Request):
body = await request.json()
timestamp = request.headers.get('X-Browser-Use-Timestamp')
signature = request.headers.get('X-Browser-Use-Signature')
if not timestamp or not signature:
raise HTTPException(status_code=400, detail='Missing timestamp or signature')
if not verify_signature(body, timestamp, signature):
raise HTTPException(status_code=403, detail='Invalid signature')
# Handle different event types
event_type = body.get('type')
if event_type == 'agent.task.status_update':
# Handle task status update
print('Task status update received:', body['payload'])
elif event_type == 'test':
# Handle test webhook
print('Test webhook received:', body['payload'])
else:
print('Unknown event type:', event_type)
return {'status': 'success', 'message': 'Webhook received'}
if __name__ == '__main__':
uvicorn.run(app, host='0.0.0.0', port=8080)
```
## Best Practices
1. **Always verify signatures**: Never process webhook payloads without verifying the signature
2. **Handle retries**: Browser Use will retry failed webhook deliveries up to 5 times
3. **Respond quickly**: Return a 200 response as soon as you've verified the signature
4. **Process asynchronously**: Handle the webhook payload processing in a background task
5. **Monitor failures**: Set up monitoring for webhook delivery failures
6. **Handle unknown events**: Implement graceful handling of new event types that may be added in the future
Need help? Contact our support team at [support@browser-use.com](mailto:support@browser-use.com) or join our
[Discord community](https://link.browser-use.com/discord)
# All Parameters
Source: https://docs.browser-use.com/customize/agent/all-parameters
Complete reference for all agent configuration options
## Available Parameters
### Core Settings
* `tools`: Registry of [our tools](https://github.com/browser-use/browser-use/blob/main/browser_use/tools/service.py) the agent can call. [Example for custom tools](https://github.com/browser-use/browser-use/tree/main/examples/custom-functions)
* `browser`: Browser object where you can specify the browser settings.
* `output_model_schema`: Pydantic model class for structured output validation. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/custom_output.py)
### Vision & Processing
* `use_vision` (default: `True`): Enable/disable vision capabilities for processing screenshots
* `vision_detail_level` (default: `'auto'`): Screenshot detail level - `'low'`, `'high'`, or `'auto'`
* `page_extraction_llm`: Separate LLM model for page content extraction. You can choose a small & fast model because it only needs to extract text from the page (default: same as `llm`)
### Actions & Behavior
* `initial_actions`: List of actions to run before the main task without LLM. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/initial_actions.py)
* `max_actions_per_step` (default: `10`): Maximum actions per step, e.g. for form filling the agent can output 10 fields at once. We execute the actions until the page changes.
* `max_failures` (default: `3`): Maximum retries for steps with errors
* `use_thinking` (default: `True`): Controls whether the agent uses its internal "thinking" field for explicit reasoning steps.
* `flash_mode` (default: `False`): Fast mode that skips evaluation, next goal and thinking and only uses memory. If `flash_mode` is enabled, it overrides `use_thinking` and disables the thinking process entirely. [Example](https://github.com/browser-use/browser-use/blob/main/examples/getting_started/05_fast_agent.py)
### System Messages
* `override_system_message`: Completely replace the default system prompt.
* `extend_system_message`: Add additional instructions to the default system prompt. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/custom_system_prompt.py)
### File & Data Management
* `save_conversation_path`: Path to save complete conversation history
* `save_conversation_path_encoding` (default: `'utf-8'`): Encoding for saved conversations
* `available_file_paths`: List of file paths the agent can access
* `sensitive_data`: Dictionary of sensitive data to handle carefully. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/sensitive_data.py)
### Visual Output
* `generate_gif` (default: `False`): Generate GIF of agent actions. Set to `True` or string path
* `include_attributes`: List of HTML attributes to include in page analysis
### Performance & Limits
* `max_history_items`: Maximum number of last steps to keep in the LLM memory. If `None`, we keep all steps.
* `llm_timeout` (default: `90`): Timeout in seconds for LLM calls
* `step_timeout` (default: `120`): Timeout in seconds for each step
* `directly_open_url` (default: `True`): If we detect a url in the task, we directly open it.
### Advanced Options
* `calculate_cost` (default: `False`): Calculate and track API costs
* `display_files_in_done_text` (default: `True`): Show file information in completion messages
### Backwards Compatibility
* `controller`: Alias for `tools` for backwards compatibility.
* `browser_session`: Alias for `browser` for backwards compatibility.
# Basics
Source: https://docs.browser-use.com/customize/agent/basics
```python
from browser_use import Agent, ChatOpenAI
agent = Agent(
task="Search for latest news about AI",
llm=ChatOpenAI(model="gpt-4.1-mini"),
)
async def main():
history = await agent.run(max_steps=100)
```
* `task`: The task you want to automate.
* `llm`: Your favorite LLM. See Supported Models.
The agent is executed using the async `run()` method:
* `max_steps` (default: `100`): Maximum number of steps the agent can take
# Output Format
Source: https://docs.browser-use.com/customize/agent/output-format
## Agent History
The `run()` method returns an `AgentHistoryList` object with the complete execution history:
```python
history = await agent.run()
# Access useful information
history.urls() # List of visited URLs
history.screenshot_paths() # List of screenshot paths
history.screenshots() # List of screenshots as base64 strings
history.action_names() # Names of executed actions
history.extracted_content() # List of extracted content from all actions
history.errors() # List of errors (with None for steps without errors)
history.model_actions() # All actions with their parameters
history.model_outputs() # All model outputs from history
history.last_action() # Last action in history
# Analysis methods
history.final_result() # Get the final extracted content (last step)
history.is_done() # Check if agent completed successfully
history.is_successful() # Check if agent completed successfully (returns None if not done)
history.has_errors() # Check if any errors occurred
history.model_thoughts() # Get the agent's reasoning process (AgentBrain objects)
history.action_results() # Get all ActionResult objects from history
history.action_history() # Get truncated action history with essential fields
history.number_of_steps() # Get the number of steps in the history
history.total_duration_seconds() # Get total duration of all steps in seconds
# Structured output (when using output_model_schema)
history.structured_output # Property that returns parsed structured output
```
See all helper methods in the [AgentHistoryList source code](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py#L301).
## Structured Output
For structured output, use the `output_model_schema` parameter with a Pydantic model. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/custom_output.py).
# Supported Models
Source: https://docs.browser-use.com/customize/agent/supported-models
Choose your favorite LLM
### Recommendations
* Best accuracy: `O3`
* Fastest: `llama4` on groq
* Balanced: fast + cheap + clever: `gemini-2.5-flash` or `gpt-4.1-mini`
### OpenAI [example](https://github.com/browser-use/browser-use/blob/main/examples/models/gpt-4.1.py)
`O3` model is recommended for best performance.
```python
from browser_use import Agent, ChatOpenAI
# Initialize the model
llm = ChatOpenAI(
model="o3",
)
# Create agent with the model
agent = Agent(
task="...", # Your task here
llm=llm
)
```
Required environment variables:
```bash .env
OPENAI_API_KEY=
```
You can use any OpenAI compatible model by passing the model name to the
`ChatOpenAI` class using a custom URL (or any other parameter that would go
into the normal OpenAI API call).
### Anthropic [example](https://github.com/browser-use/browser-use/blob/main/examples/models/claude-4-sonnet.py)
```python
from browser_use import Agent, ChatAnthropic
# Initialize the model
llm = ChatAnthropic(
model="claude-sonnet-4-0",
)
# Create agent with the model
agent = Agent(
task="...", # Your task here
llm=llm
)
```
And add the variable:
```bash .env
ANTHROPIC_API_KEY=
```
### Azure OpenAI [example](https://github.com/browser-use/browser-use/blob/main/examples/models/azure_openai.py)
```python
from browser_use import Agent, ChatAzureOpenAI
from pydantic import SecretStr
import os
# Initialize the model
llm = ChatAzureOpenAI(
model="o4-mini",
)
# Create agent with the model
agent = Agent(
task="...", # Your task here
llm=llm
)
```
Required environment variables:
```bash .env
AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
AZURE_OPENAI_API_KEY=
```
### Gemini [example](https://github.com/browser-use/browser-use/blob/main/examples/models/gemini.py)
> \[!IMPORTANT] `GEMINI_API_KEY` was the old environment var name, it should be called `GOOGLE_API_KEY` as of 2025-05.
```python
from browser_use import Agent, ChatGoogle
from dotenv import load_dotenv
# Read GOOGLE_API_KEY into env
load_dotenv()
# Initialize the model
llm = ChatGoogle(model='gemini-2.5-flash')
# Create agent with the model
agent = Agent(
task="Your task here",
llm=llm
)
```
Required environment variables:
```bash .env
GOOGLE_API_KEY=
```
### AWS Bedrock [example](https://github.com/browser-use/browser-use/blob/main/examples/models/aws.py)
AWS Bedrock provides access to multiple model providers through a single API. We support both a general AWS Bedrock client and provider-specific convenience classes.
#### General AWS Bedrock (supports all providers)
```python
from browser_use import Agent, ChatAWSBedrock
# Works with any Bedrock model (Anthropic, Meta, AI21, etc.)
llm = ChatAWSBedrock(
model="anthropic.claude-3-5-sonnet-20240620-v1:0", # or any Bedrock model
aws_region="us-east-1",
)
# Create agent with the model
agent = Agent(
task="Your task here",
llm=llm
)
```
#### Anthropic Claude via AWS Bedrock (convenience class)
```python
from browser_use import Agent, ChatAnthropicBedrock
# Anthropic-specific class with Claude defaults
llm = ChatAnthropicBedrock(
model="anthropic.claude-3-5-sonnet-20240620-v1:0",
aws_region="us-east-1",
)
# Create agent with the model
agent = Agent(
task="Your task here",
llm=llm
)
```
#### AWS Authentication
Required environment variables:
```bash .env
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_DEFAULT_REGION=us-east-1
```
You can also use AWS profiles or IAM roles instead of environment variables. The implementation supports:
* Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_DEFAULT_REGION`)
* AWS profiles and credential files
* IAM roles (when running on EC2)
* Session tokens for temporary credentials
* AWS SSO authentication (`aws_sso_auth=True`)
## Groq [example](https://github.com/browser-use/browser-use/blob/main/examples/models/llama4-groq.py)
```python
from browser_use import Agent, ChatGroq
llm = ChatGroq(model="meta-llama/llama-4-maverick-17b-128e-instruct")
agent = Agent(
task="Your task here",
llm=llm
)
```
Required environment variables:
```bash .env
GROQ_API_KEY=
```
## Ollama
```python
from browser_use import Agent, ChatOllama
llm = ChatOllama(model="llama3.1:8b")
```
## Langchain
[Example](https://github.com/browser-use/browser-use/blob/main/examples/models/langchain) on how to use Langchain with Browser Use.
## Other models (DeepSeek, Novita, X, Qwen...)
We support all other models that can be called via OpenAI compatible API. We are open to PRs for more providers.
**Examples available:**
* [DeepSeek](https://github.com/browser-use/browser-use/blob/main/examples/models/deepseek-chat.py)
* [Novita](https://github.com/browser-use/browser-use/blob/main/examples/models/novita.py)
* [OpenRouter](https://github.com/browser-use/browser-use/blob/main/examples/models/openrouter.py)
# All Parameters
Source: https://docs.browser-use.com/customize/browser/all-parameters
Complete reference for all browser configuration options
## Core Settings
* `cdp_url`: CDP URL for connecting to existing browser instance (e.g., `"http://localhost:9222"`)
## Display & Appearance
* `headless` (default: `None`): Run browser without UI. Auto-detects based on display availability (`True`/`False`/`None`)
* `window_size`: Browser window size for headful mode. Use dict `{'width': 1920, 'height': 1080}` or `ViewportSize` object
* `window_position` (default: `{'width': 0, 'height': 0}`): Window position from top-left corner in pixels
* `viewport`: Content area size, same format as `window_size`. Use `{'width': 1280, 'height': 720}` or `ViewportSize` object
* `no_viewport` (default: `None`): Disable viewport emulation, content fits to window size
* `device_scale_factor`: Device scale factor (DPI). Set to `2.0` or `3.0` for high-resolution screenshots
## Browser Behavior
* `keep_alive` (default: `None`): Keep browser running after agent completes
* `allowed_domains`: Restrict navigation to specific domains. Domain pattern formats:
* `'example.com'` - Matches only `https://example.com/*`
* `'*.example.com'` - Matches `https://example.com/*` and any subdomain `https://*.example.com/*`
* `'http*://example.com'` - Matches both `http://` and `https://` protocols
* `'chrome-extension://*'` - Matches any Chrome extension URL
* **Security**: Wildcards in TLD (e.g., `example.*`) are **not allowed** for security
* Use list like `['*.google.com', 'https://example.com', 'chrome-extension://*']`
* `enable_default_extensions` (default: `True`): Load automation extensions (uBlock Origin, cookie handlers, ClearURLs)
* `cross_origin_iframes` (default: `False`): Enable cross-origin iframe support (may cause complexity)
* `is_local` (default: `True`): Whether this is a local browser instance. Set to `False` for remote browsers. If we have a `executable_path` set, it will be automatically set to `True`. This can effect your download behavior.
## User Data & Profiles
* `user_data_dir` (default: auto-generated temp): Directory for browser profile data. Use `None` for incognito mode
* `profile_directory` (default: `'Default'`): Chrome profile subdirectory name (`'Profile 1'`, `'Work Profile'`, etc.)
* `storage_state`: Browser storage state (cookies, localStorage). Can be file path string or dict object
## Network & Security
* `proxy`: Proxy configuration using `ProxySettings(server='http://host:8080', bypass='localhost,127.0.0.1', username='user', password='pass')`
* `permissions` (default: `['clipboardReadWrite', 'notifications']`): Browser permissions to grant. Use list like `['camera', 'microphone', 'geolocation']`
* `headers`: Additional HTTP headers for connect requests (remote browsers only)
## Browser Launch
* `executable_path`: Path to browser executable for custom installations. Platform examples:
* macOS: `'/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'`
* Windows: `'C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'`
* Linux: `'/usr/bin/google-chrome'`
* `channel`: Browser channel (`'chromium'`, `'chrome'`, `'chrome-beta'`, `'msedge'`, etc.)
* `args`: Additional command-line arguments for the browser. Use list format: `['--disable-gpu', '--custom-flag=value', '--another-flag']`
* `env`: Environment variables for browser process. Use dict like `{'DISPLAY': ':0', 'LANG': 'en_US.UTF-8', 'CUSTOM_VAR': 'test'}`
* `chromium_sandbox` (default: `True` except in Docker): Enable Chromium sandboxing for security
* `devtools` (default: `False`): Open DevTools panel automatically (requires `headless=False`)
* `ignore_default_args`: List of default args to disable, or `True` to disable all. Use list like `['--enable-automation', '--disable-extensions']`
## Timing & Performance
* `minimum_wait_page_load_time` (default: `0.25`): Minimum time to wait before capturing page state in seconds
* `wait_for_network_idle_page_load_time` (default: `0.5`): Time to wait for network activity to cease in seconds
* `wait_between_actions` (default: `0.5`): Time to wait between agent actions in seconds
## AI Integration
* `highlight_elements` (default: `True`): Highlight interactive elements for AI vision
## Downloads & Files
* `accept_downloads` (default: `True`): Automatically accept all downloads
* `downloads_path`: Directory for downloaded files. Use string like `'./downloads'` or `Path` object
* `auto_download_pdfs` (default: `True`): Automatically download PDFs instead of viewing in browser
## Device Emulation
* `user_agent`: Custom user agent string. Example: `'Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X)'`
* `screen`: Screen size information, same format as `window_size`
## Recording & Debugging
* `record_video_dir`: Directory to save video recordings as `.webm` files
* `record_har_path`: Path to save network trace files as `.har` format
* `traces_dir`: Directory to save complete trace files for debugging
* `record_har_content` (default: `'embed'`): HAR content mode (`'omit'`, `'embed'`, `'attach'`)
* `record_har_mode` (default: `'full'`): HAR recording mode (`'full'`, `'minimal'`)
## Advanced Options
* `disable_security` (default: `False`): ⚠️ **NOT RECOMMENDED** - Disables all browser security features
* `deterministic_rendering` (default: `False`): ⚠️ **NOT RECOMMENDED** - Forces consistent rendering but reduces performance
***
## Outdated BrowserProfile
For backward compatibility, you can pass all the parameters from above to the `BrowserProfile` and then to the `Browser`.
```python
from browser_use import BrowserProfile
profile = BrowserProfile(headless=False)
browser = Browser(browser_profile=profile)
```
## Browser vs BrowserSession
`Browser` is an alias for `BrowserSession` - they are exactly the same class:
Use `Browser` for cleaner, more intuitive code.
# Basics
Source: https://docs.browser-use.com/customize/browser/basics
***
```python
from browser_use import Agent, Browser, ChatOpenAI
browser = Browser(
headless=False, # Show browser window
window_size={'width': 1000, 'height': 700}, # Set window size
)
agent = Agent(
task='Search for Browser Use',
browser=browser,
llm=ChatOpenAI(model='gpt-4.1-mini'),
)
async def main():
await agent.run()
```
# Real Browser
Source: https://docs.browser-use.com/customize/browser/real-browser
Connect your existing Chrome browser to preserve authentication.
## Basic Example
```python
from browser_use import Agent, Browser, ChatOpenAI
# Connect to your existing Chrome browser
browser = Browser(
executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
user_data_dir='~/Library/Application Support/Google/Chrome',
profile_directory='Default',
)
agent = Agent(
task='Visit https://duckduckgo.com and search for "browser-use founders"',
browser=browser,
llm=ChatOpenAI(model='gpt-4.1-mini'),
)
async def main():
await agent.run()
```
> **Note:** You need to fully close chrome before running this example.
> **Note:** Google blocks this approach currently so we use DuckDuckGo instead.
## How it Works
1. **`executable_path`** - Path to your Chrome installation
2. **`user_data_dir`** - Your Chrome profile folder (keeps cookies, extensions, bookmarks)
3. **`profile_directory`** - Specific profile name (Default, Profile 1, etc.)
## Platform Paths
```python
# macOS
executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'
user_data_dir='~/Library/Application Support/Google/Chrome'
# Windows
executable_path='C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'
user_data_dir='%LOCALAPPDATA%\\Google\\Chrome\\User Data'
# Linux
executable_path='/usr/bin/google-chrome'
user_data_dir='~/.config/google-chrome'
```
# Remote Browser
Source: https://docs.browser-use.com/customize/browser/remote
```python
from browser_use import Agent, Browser, ChatOpenAI
# Connect to remote browser
browser = Browser(
cdp_url='http://remote-server:9222'
)
agent = Agent(
task="Your task here",
llm=ChatOpenAI(model='gpt-4.1-mini'),
browser=browser,
)
```
## Get a CDP URL
### Cloud Browser
Get a cdp url from your favorite browser provider like AnchorBorwser, HyperBrowser, BrowserBase, Steel.dev, etc.
### Proxy Connection
```python
from browser_use import Agent, Browser, ChatOpenAI
from browser_use.browser import ProxySettings
browser = Browser(
headless=False,
proxy=ProxySettings(
server="http://proxy-server:8080",
username="proxy-user",
password="proxy-pass"
)
cdp_url="http://remote-server:9222"
)
agent = Agent(
task="Your task here",
llm=ChatOpenAI(model='gpt-4.1-mini'),
browser=browser,
)
```
# Chain Agents
Source: https://docs.browser-use.com/customize/examples/chain-agents
Chain multiple tasks together with the same agent and browser session.
## Chain Agent Tasks
Keep your browser session alive and chain multiple tasks together. Perfect for conversational workflows or multi-step processes.
```python
import asyncio
from dotenv import load_dotenv
load_dotenv()
from browser_use import Agent, BrowserProfile
profile = BrowserProfile(keep_alive=True)
async def main():
agent = Agent(task="Go to reddit.com", browser_profile=profile)
await agent.run(max_steps=1)
while True:
user_response = input('\n👤 New task or "q" to quit: ')
if user_response.lower() == 'q':
break
agent.add_new_task(f'New task: {user_response}')
await agent.run()
if __name__ == '__main__':
asyncio.run(main())
```
## How It Works
1. **Persistent Browser**: `BrowserProfile(keep_alive=True)` prevents browser from closing between tasks
2. **Task Chaining**: Use `agent.add_new_task()` to add follow-up tasks
3. **Context Preservation**: Agent maintains memory and browser state across tasks
4. **Interactive Flow**: Perfect for conversational interfaces or complex workflows
The browser session remains active throughout the entire chain, preserving all cookies, local storage, and page state.
# Fast Agent
Source: https://docs.browser-use.com/customize/examples/fast-agent
Optimize agent performance for maximum speed and efficiency.
```python
import asyncio
from dotenv import load_dotenv
load_dotenv()
from browser_use import Agent, BrowserProfile
# Speed optimization instructions for the model
SPEED_OPTIMIZATION_PROMPT = """
Speed optimization instructions:
- Be extremely concise and direct in your responses
- Get to the goal as quickly as possible
- Use multi-action sequences whenever possible to reduce steps
"""
async def main():
# 1. Use fast LLM - Llama 4 on Groq for ultra-fast inference
from browser_use import ChatGroq
llm = ChatGroq(
model='meta-llama/llama-4-maverick-17b-128e-instruct',
temperature=0.0,
)
# from browser_use import ChatGoogle
# llm = ChatGoogle(model='gemini-2.5-flash')
# 2. Create speed-optimized browser profile
browser_profile = BrowserProfile(
minimum_wait_page_load_time=0.1,
wait_between_actions=0.1,
headless=False,
)
# 3. Define a speed-focused task
task = """
1. Go to reddit https://www.reddit.com/search/?q=browser+agent&type=communities
2. Click directly on the first 5 communities to open each in new tabs
3. Find out what the latest post is about, and switch directly to the next tab
4. Return the latest post summary for each page
"""
# 4. Create agent with all speed optimizations
agent = Agent(
task=task,
llm=llm,
flash_mode=True, # Disables thinking in the LLM output for maximum speed
browser_profile=browser_profile,
extend_system_message=SPEED_OPTIMIZATION_PROMPT,
)
await agent.run()
if __name__ == '__main__':
asyncio.run(main())
```
## Speed Optimization Techniques
### 1. Fast LLM Models
```python
# Groq - Ultra-fast inference
from browser_use import ChatGroq
llm = ChatGroq(model='meta-llama/llama-4-maverick-17b-128e-instruct')
# Google Gemini Flash - Optimized for speed
from browser_use import ChatGoogle
llm = ChatGoogle(model='gemini-2.5-flash')
```
### 2. Browser Optimizations
```python
browser_profile = BrowserProfile(
minimum_wait_page_load_time=0.1, # Reduce wait time
wait_between_actions=0.1, # Faster action execution
headless=True, # No GUI overhead
)
```
### 3. Agent Optimizations
```python
agent = Agent(
task=task,
llm=llm,
flash_mode=True, # Skip LLM thinking process
extend_system_message=SPEED_PROMPT, # Optimize LLM behavior
)
```
# More Examples
Source: https://docs.browser-use.com/customize/examples/more-examples
Explore additional examples and use cases on GitHub.
### 🔗 Browse All Examples
**[View Complete Examples Directory →](https://github.com/browser-use/browser-use/tree/main/examples)**
### 🤝 Contributing Examples
Have a great use case? **[Submit a pull request](https://github.com/browser-use/browser-use/pulls)** with your example!
# Parallel Agents
Source: https://docs.browser-use.com/customize/examples/parallel-browser
Run multiple agents in parallel with separate browser instances
```python
import asyncio
from browser_use import Agent, Browser, ChatOpenAI
async def main():
# Create 3 separate browser instances
browsers = [
Browser(
user_data_dir=f'./temp-profile-{i}',
headless=False,
)
for i in range(3)
]
# Create 3 agents with different tasks
agents = [
Agent(
task='Search for "browser automation" on Google',
browser=browsers[0],
llm=ChatOpenAI(model='gpt-4.1-mini'),
),
Agent(
task='Search for "AI agents" on DuckDuckGo',
browser=browsers[1],
llm=ChatOpenAI(model='gpt-4.1-mini'),
),
Agent(
task='Visit Wikipedia and search for "web scraping"',
browser=browsers[2],
llm=ChatOpenAI(model='gpt-4.1-mini'),
),
]
# Run all agents in parallel
tasks = [agent.run() for agent in agents]
results = await asyncio.gather(*tasks, return_exceptions=True)
print('🎉 All agents completed!')
```
> **Note:** This is experimental, and agents might conflict each other.
# Secure Setup
Source: https://docs.browser-use.com/customize/examples/secure
Azure OpenAI with data privacy and security configuration.
## Secure Setup with Azure OpenAI
Enterprise-grade security with Azure OpenAI, data privacy protection, and restricted browser access.
```python
import asyncio
import os
from dotenv import load_dotenv
load_dotenv()
os.environ['ANONYMIZED_TELEMETRY'] = 'false'
from browser_use import Agent, BrowserProfile, ChatAzureOpenAI
# Azure OpenAI configuration
api_key = os.getenv('AZURE_OPENAI_KEY')
azure_endpoint = os.getenv('AZURE_OPENAI_ENDPOINT')
llm = ChatAzureOpenAI(model='gpt-4.1-mini', api_key=api_key, azure_endpoint=azure_endpoint)
# Secure browser configuration
browser_profile = BrowserProfile(
allowed_domains=['*google.com', 'browser-use.com'],
enable_default_extensions=False
)
# Sensitive data filtering
sensitive_data = {'company_name': 'browser-use'}
# Create secure agent
agent = Agent(
task='Find the founders of the sensitive company_name',
llm=llm,
browser_profile=browser_profile,
sensitive_data=sensitive_data
)
async def main():
await agent.run(max_steps=10)
asyncio.run(main())
```
## Security Features
**Azure OpenAI:**
* NOT used to train OpenAI models
* NOT shared with other customers
* Hosted entirely within Azure
* 30-day retention (or zero with Limited Access Program)
**Browser Security:**
* `allowed_domains`: Restrict navigation to trusted sites
* `enable_default_extensions=False`: Disable potentially dangerous extensions
* `sensitive_data`: Filter sensitive information from LLM input
For enterprise deployments contact [support@browser-use.com](mailto:support@browser-use.com).
# Sensitive Data
Source: https://docs.browser-use.com/customize/examples/sensitive-data
Handle sensitive information securely and avoid sending PII & passwords to the LLM.
```python
import os
from browser_use import Agent, Browser, ChatOpenAI
os.environ['ANONYMIZED_TELEMETRY'] = "false"
agent = Agent(
task='Log into example.com with username x_user and password x_pass',
sensitive_data={
'https://example.com': {
'x_user': 'your-real-username@email.com',
'x_pass': 'your-real-password123',
},
},
use_vision=False, # Disable vision to prevent LLM seeing sensitive data in screenshots
llm=ChatOpenAI(model='gpt-4.1-mini'),
)
async def main():
await agent.run()
```
## How it Works
1. **Text Filtering**: The LLM only sees placeholders (`x_user`, `x_pass`), we filter your sensitive data from the input text.
2. **DOM Actions**: Real values are injected directly into form fields after the LLM call
## Best Practices
* Use `Browser(allowed_domains=[...])` to restrict navigation
* Set `use_vision=False` to prevent screenshot leaks
* Use `storage_state='./auth.json'` for login cookies instead of passwords when possible
# Lifecycle Hooks
Source: https://docs.browser-use.com/customize/hooks
Customize agent behavior with lifecycle hooks
Browser-Use provides lifecycle hooks that allow you to execute custom code at specific points during the agent's execution.
Hook functions can be used to read and modify agent state while running, implement custom logic, change configuration, integrate the Agent with external applications.
## Available Hooks
Currently, Browser-Use provides the following hooks:
| Hook | Description | When it's called |
| --------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| `on_step_start` | Executed at the beginning of each agent step | Before the agent processes the current state and decides on the next action |
| `on_step_end` | Executed at the end of each agent step | After the agent has executed all the actions for the current step, before it starts the next step |
```python
await agent.run(on_step_start=..., on_step_end=...)
```
Each hook should be an `async` callable function that accepts the `agent` instance as its only parameter.
### Basic Example
```python
from browser_use import Agent, ChatOpenAI
async def my_step_hook(agent: Agent):
# inside a hook you can access all the state and methods under the Agent object:
# agent.settings, agent.state, agent.task
# agent.tools, agent.llm, agent.browser_session
# agent.pause(), agent.resume(), agent.add_new_task(...), etc.
# You also have direct access to the browser state
state = await agent.browser_session.get_browser_state_summary()
current_url = state.url
visit_log = agent.history.urls()
previous_url = visit_log[-2] if len(visit_log) >= 2 else None
print(f"Agent was last on URL: {previous_url} and is now on {current_url}")
# Example: listen for events on the page, interact with the DOM, run JS directly, etc.
await page.on('domcontentloaded', lambda: print('page navigated to a new url...'))
await page.locator("css=form > input[type=submit]").click()
await page.evaluate('() => alert(1)')
await page.browser.new_tab
await agent.browser_session.session.context.add_init_script('/* some JS to run on every page */')
# Example: monitor or intercept all network requests
async def handle_request(route):
# Print, modify, block, etc. do anything to the requests here
# https://playwright.dev/python/docs/network#handle-requests
print(route.request, route.request.headers)
await route.continue_(headers=route.request.headers)
await page.route("**/*", handle_route)
# Example: pause agent execution and resume it based on some custom code
if '/completed' in current_url:
agent.pause()
Path('result.txt').write_text(await page.content())
input('Saved "completed" page content to result.txt, press [Enter] to resume...')
agent.resume()
agent = Agent(
task="Search for the latest news about AI",
llm=ChatOpenAI(model="gpt-4.1-mini"),
)
await agent.run(
on_step_start=my_step_hook,
# on_step_end=...
max_steps=10
)
```
## Data Available in Hooks
When working with agent hooks, you have access to the entire `Agent` instance. Here are some useful data points you can access:
* `agent.task` lets you see what the main task is, `agent.add_new_task(...)` lets you queue up a new one
* `agent.tools` give access to the `Tools()` object and `Registry()` containing the available actions
* `agent.tools.registry.execute_action('click_element_by_index', {'index': 123}, browser_session=agent.browser_session)`
* `agent.context` lets you access any user-provided context object passed in to `Agent(context=...)`
* `agent.sensitive_data` contains the sensitive data dict, which can be updated in-place to add/remove/modify items
* `agent.settings` contains all the configuration options passed to the `Agent(...)` at init time
* `agent.llm` gives direct access to the main LLM object (e.g. `ChatOpenAI`)
* `agent.state` gives access to lots of internal state, including agent thoughts, outputs, actions, etc.
* `agent.history` gives access to historical data from the agent's execution:
* `agent.history.model_thoughts()`: Reasoning from Browser Use's model.
* `agent.history.model_outputs()`: Raw outputs from the Browser Use's model.
* `agent.history.model_actions()`: Actions taken by the agent
* `agent.history.extracted_content()`: Content extracted from web pages
* `agent.history.urls()`: URLs visited by the agent
* `agent.browser_session` gives direct access to the `Browser()` and CDP interface
* `agent.browser_session.agent_focus`: Get the current CDP session the agent is focused on
* `agent.browser_session.get_or_create_cdp_session()`: Get the current CDP session for browser interaction
* `agent.browser_session.get_tabs()`: Get all tabs currently open
* `agent.browser_session.get_page_html()`: Current page HTML
* `agent.browser_session.take_screenshot()`: Screenshot of the current page
## Tips for Using Hooks
* **Avoid blocking operations**: Since hooks run in the same execution thread as the agent, try to keep them efficient or use asynchronous patterns.
* **Handle exceptions**: Make sure your hook functions handle exceptions gracefully to prevent interrupting the agent's main flow.
* **Use custom actions instead**: hooks are fairly advanced, most things can be implemented with [custom action functions](/customize/custom-functions) instead
***
## Complex Example: Agent Activity Recording System
This comprehensive example demonstrates a complete implementation for recording and saving Browser-Use agent activity, consisting of both server and client components.
### Setup Instructions
To use this example, you'll need to:
1. Set up the required dependencies:
```bash
pip install fastapi uvicorn prettyprinter pyobjtojson dotenv browser-use
```
2. Create two separate Python files:
* `api.py` - The FastAPI server component
* `client.py` - The Browser-Use agent with recording hook
3. Run both components:
* Start the API server first: `python api.py`
* Then run the client: `python client.py`
### Server Component (api.py)
The server component handles receiving and storing the agent's activity data:
```python
#!/usr/bin/env python3
#
# FastAPI API to record and save Browser-Use activity data.
# Save this code to api.py and run with `python api.py`
#
import json
import base64
from pathlib import Path
from fastapi import FastAPI, Request
import prettyprinter
import uvicorn
prettyprinter.install_extras()
# Utility function to save screenshots
def b64_to_png(b64_string: str, output_file):
"""
Convert a Base64-encoded string to a PNG file.
:param b64_string: A string containing Base64-encoded data
:param output_file: The path to the output PNG file
"""
with open(output_file, "wb") as f:
f.write(base64.b64decode(b64_string))
# Initialize FastAPI app
app = FastAPI()
@app.post("/post_agent_history_step")
async def post_agent_history_step(request: Request):
data = await request.json()
prettyprinter.cpprint(data)
# Ensure the "recordings" folder exists using pathlib
recordings_folder = Path("recordings")
recordings_folder.mkdir(exist_ok=True)
# Determine the next file number by examining existing .json files
existing_numbers = []
for item in recordings_folder.iterdir():
if item.is_file() and item.suffix == ".json":
try:
file_num = int(item.stem)
existing_numbers.append(file_num)
except ValueError:
# In case the file name isn't just a number
pass
if existing_numbers:
next_number = max(existing_numbers) + 1
else:
next_number = 1
# Construct the file path
file_path = recordings_folder / f"{next_number}.json"
# Save the JSON data to the file
with file_path.open("w") as f:
json.dump(data, f, indent=2)
# Optionally save screenshot if needed
# if "website_screenshot" in data and data["website_screenshot"]:
# screenshot_folder = Path("screenshots")
# screenshot_folder.mkdir(exist_ok=True)
# b64_to_png(data["website_screenshot"], screenshot_folder / f"{next_number}.png")
return {"status": "ok", "message": f"Saved to {file_path}"}
if __name__ == "__main__":
print("Starting Browser-Use recording API on http://0.0.0.0:9000")
uvicorn.run(app, host="0.0.0.0", port=9000)
```
### Client Component (client.py)
The client component runs the Browser-Use agent with a recording hook:
```python
#!/usr/bin/env python3
#
# Client to record and save Browser-Use activity.
# Save this code to client.py and run with `python client.py`
#
import asyncio
import requests
from dotenv import load_dotenv
from pyobjtojson import obj_to_json
from browser_use.llm import ChatOpenAI
from browser_use import Agent
# Load environment variables (for API keys)
load_dotenv()
def send_agent_history_step(data):
"""Send the agent step data to the recording API"""
url = "http://127.0.0.1:9000/post_agent_history_step"
response = requests.post(url, json=data)
return response.json()
async def record_activity(agent_obj):
"""Hook function that captures and records agent activity at each step"""
website_html = None
website_screenshot = None
urls_json_last_elem = None
model_thoughts_last_elem = None
model_outputs_json_last_elem = None
model_actions_json_last_elem = None
extracted_content_json_last_elem = None
print('--- ON_STEP_START HOOK ---')
# Capture current page state
website_html = await agent_obj.browser_session.get_page_html()
website_screenshot = await agent_obj.browser_session.take_screenshot()
# Make sure we have state history
if hasattr(agent_obj, "state"):
history = agent_obj.state.history
else:
history = None
print("Warning: Agent has no state history")
return
# Process model thoughts
model_thoughts = obj_to_json(
obj=history.model_thoughts(),
check_circular=False
)
if len(model_thoughts) > 0:
model_thoughts_last_elem = model_thoughts[-1]
# Process model outputs
model_outputs = agent_obj.state.history.model_outputs()
model_outputs_json = obj_to_json(
obj=model_outputs,
check_circular=False
)
if len(model_outputs_json) > 0:
model_outputs_json_last_elem = model_outputs_json[-1]
# Process model actions
model_actions = agent_obj.state.history.model_actions()
model_actions_json = obj_to_json(
obj=model_actions,
check_circular=False
)
if len(model_actions_json) > 0:
model_actions_json_last_elem = model_actions_json[-1]
# Process extracted content
extracted_content = agent_obj.state.history.extracted_content()
extracted_content_json = obj_to_json(
obj=extracted_content,
check_circular=False
)
if len(extracted_content_json) > 0:
extracted_content_json_last_elem = extracted_content_json[-1]
# Process URLs
urls = agent_obj.state.history.urls()
urls_json = obj_to_json(
obj=urls,
check_circular=False
)
if len(urls_json) > 0:
urls_json_last_elem = urls_json[-1]
# Create a summary of all data for this step
model_step_summary = {
"website_html": website_html,
"website_screenshot": website_screenshot,
"url": urls_json_last_elem,
"model_thoughts": model_thoughts_last_elem,
"model_outputs": model_outputs_json_last_elem,
"model_actions": model_actions_json_last_elem,
"extracted_content": extracted_content_json_last_elem
}
print("--- MODEL STEP SUMMARY ---")
print(f"URL: {urls_json_last_elem}")
# Send data to the API
result = send_agent_history_step(data=model_step_summary)
print(f"Recording API response: {result}")
async def run_agent():
"""Run the Browser-Use agent with the recording hook"""
agent = Agent(
task="Compare the price of gpt-4o and DeepSeek-V3",
llm=ChatOpenAI(model="gpt-4.1-mini"),
)
try:
print("Starting Browser-Use agent with recording hook")
await agent.run(
on_step_start=record_activity,
max_steps=30
)
except Exception as e:
print(f"Error running agent: {e}")
if __name__ == "__main__":
# Check if API is running
try:
requests.get("http://127.0.0.1:9000")
print("Recording API is available")
except:
print("Warning: Recording API may not be running. Start api.py first.")
# Run the agent
asyncio.run(run_agent())
```
Contribution by Carlos A. Planchón.
### Working with the Recorded Data
After running the agent, you'll find the recorded data in the `recordings` directory. Here's how you can use this data:
1. **View recorded sessions**: Each JSON file contains a snapshot of agent activity for one step
2. **Extract screenshots**: You can modify the API to save screenshots separately
3. **Analyze agent behavior**: Use the recorded data to study how the agent navigates websites
### Extending the Example
You can extend this recording system in several ways:
1. **Save screenshots separately**: Uncomment the screenshot saving code in the API
2. **Add a web dashboard**: Create a simple web interface to view recorded sessions
3. **Add session IDs**: Modify the API to group steps by agent session
4. **Add filtering**: Implement filters to record only specific types of actions
# MCP Client
Source: https://docs.browser-use.com/customize/mcp-client
Connect external MCP servers to extend browser-use with additional tools and integrations
The MCP (Model Context Protocol) client allows browser-use agents to connect to external MCP servers, automatically exposing their tools as actions.
MCP is an open protocol for integrating LLMs with external data sources and tools. Learn more at [modelcontextprotocol.io](https://modelcontextprotocol.io).
Looking to expose browser-use as an MCP server instead? See [MCP Server](/customize/mcp-server).
## Installation
```bash
uv pip install "browser-use[cli]"
```
## Quick Start
```python
import os
from browser_use import Agent, Tools
from browser_use.mcp.client import MCPClient
# Create tools
tools = Tools()
# Connect to MCP server
mcp_client = MCPClient(
server_name="filesystem",
command="npx",
args=["@modelcontextprotocol/server-filesystem", "/path/to/files"]
)
# Connect and register
await mcp_client.connect()
await mcp_client.register_to_tools(tools)
# Agent can now use filesystem tools
agent = Agent(
task="Read the README.md file",
tools=tools
)
await agent.run()
# Clean up
await mcp_client.disconnect()
```
## API Reference
### MCPClient
```python
class MCPClient:
def __init__(
self,
server_name: str,
command: str,
args: list[str] | None = None,
env: dict[str, str] | None = None,
) -> None
```
**Parameters:**
* `server_name`: Name of the MCP server (for logging)
* `command`: Command to start the server (e.g., `"npx"`)
* `args`: Arguments for the command
* `env`: Environment variables for the server
**Key Methods:**
```python
# Connect to server
await mcp_client.connect()
# Register tools to tools
await mcp_client.register_to_tools(
tools,
tool_filter=['read_file', 'write_file'], # Optional
prefix='fs_' # Optional prefix
)
# Disconnect
await mcp_client.disconnect()
```
### Context Manager Usage
```python
async with MCPClient(
server_name="github",
command="npx",
args=["@modelcontextprotocol/server-github"],
env={"GITHUB_TOKEN": os.getenv("GITHUB_TOKEN")}
) as client:
await client.register_to_tools(tools)
await agent.run()
# Automatically disconnected
```
## Common MCP Servers
### Filesystem
```python
MCPClient(
server_name="filesystem",
command="npx",
args=["@modelcontextprotocol/server-filesystem", "/path"]
)
```
### PostgreSQL
```python
MCPClient(
server_name="postgres",
command="npx",
args=["@modelcontextprotocol/server-postgres", "postgresql://localhost/db"]
)
```
### GitHub
```python
MCPClient(
server_name="github",
command="npx",
args=["@modelcontextprotocol/server-github"],
env={"GITHUB_TOKEN": os.getenv("GITHUB_TOKEN")}
)
```
## Multiple Servers
Connect multiple servers with prefixes to avoid conflicts:
```python
# Filesystem server
fs_client = MCPClient(
server_name="filesystem",
command="npx",
args=["@modelcontextprotocol/server-filesystem", "."]
)
await fs_client.connect()
await fs_client.register_to_tools(tools, prefix="fs_")
# GitHub server
gh_client = MCPClient(
server_name="github",
command="npx",
args=["@modelcontextprotocol/server-github"],
env={"GITHUB_TOKEN": os.getenv("GITHUB_TOKEN")}
)
await gh_client.connect()
await gh_client.register_to_tools(tools, prefix="gh_")
# Agent can use both
agent = Agent(
task="Read README.md and create a GitHub issue",
tools=tools
)
await agent.run()
# Clean up
await fs_client.disconnect()
await gh_client.disconnect()
```
## Tool Filtering
Register only specific tools:
```python
await mcp_client.register_to_tools(
tools,
tool_filter=['read_file', 'list_directory']
)
```
## Custom MCP Server
Create your own MCP server:
```python
# my_server.py
import mcp.server.stdio
import mcp.types as types
from mcp.server import Server
server = Server("custom-tools")
@server.list_tools()
async def handle_list_tools() -> list[types.Tool]:
return [
types.Tool(
name="calculate",
description="Perform calculation",
inputSchema={
"type": "object",
"properties": {
"expression": {"type": "string"}
},
"required": ["expression"]
}
)
]
@server.call_tool()
async def handle_call_tool(name: str, arguments: dict) -> list[types.TextContent]:
if name == "calculate":
result = eval(arguments["expression"])
return [types.TextContent(type="text", text=str(result))]
return []
# Run server
async def main():
async with mcp.server.stdio.stdio_server() as (read, write):
await server.run(read, write, ...)
if __name__ == "__main__":
import asyncio
asyncio.run(main())
```
Connect custom server:
```python
custom_client = MCPClient(
server_name="custom",
command="python",
args=["my_server.py"]
)
```
## Best Practices
1. **Always disconnect** when done
2. **Use prefixes** when connecting multiple servers
3. **Filter tools** to limit capabilities
4. **Use context managers** for automatic cleanup
## See Also
* [MCP Server](/customize/mcp-server) - Expose browser-use as an MCP server
* [Custom Functions](/customize/custom-functions) - Write custom actions directly
* [Model Context Protocol](https://modelcontextprotocol.io) - MCP specification
# MCP Server
Source: https://docs.browser-use.com/customize/mcp-server
Expose browser-use capabilities as an MCP server for AI assistants like Claude Desktop
The MCP server exposes browser-use's browser automation capabilities as tools that can be used by AI assistants like Claude Desktop. This allows external MCP clients to control browsers, navigate websites, extract content, and perform automated tasks.
This is the opposite of the [MCP Client](/customize/mcp-client). The MCP client lets browser-use connect to external MCP servers, while this MCP server lets external AI assistants connect to browser-use.
## Overview
The MCP server acts as a bridge between MCP-compatible AI assistants and browser-use:
```mermaid
graph LR
A[Claude Desktop] -->|MCP Protocol| B[Browser-use MCP Server]
B --> C[Browser]
B --> D[Tools]
B --> E[FileSystem]
C --> F[Playwright Browser]
style B fill:#f9f,stroke:#333,stroke-width:2px
```
## Installation
```bash
uv pip install "browser-use[cli]"
```
## Quick Start
### 1. Configure Claude Desktop
Add browser-use to your Claude Desktop configuration:
Edit `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"browser-use": {
"command": "uvx",
"args": ["browser-use[cli]", "--mcp"],
"env": {
"OPENAI_API_KEY": "sk-..." // Optional: for content extraction
}
}
}
}
```
Edit `%APPDATA%\Claude\claude_desktop_config.json`:
```json
{
"mcpServers": {
"browser-use": {
"command": "uvx",
"args": ["browser-use[cli]", "--mcp"],
"env": {
"OPENAI_API_KEY": "sk-..." // Optional: for content extraction
}
}
}
}
```
### 2. Restart Claude Desktop
The browser-use tools will appear in Claude's tools menu (🔌 icon).
### 3. Use Browser Automation
Ask Claude to perform browser tasks:
* "Navigate to example.com and describe what you see"
* "Search for 'browser automation' on Google"
* "Fill out the contact form on this website"
## API Reference
### Available Tools
The MCP server exposes the following tools to MCP clients:
#### Navigation Tools
##### `browser_navigate`
Navigate to a URL.
```typescript
browser_navigate(url: string, new_tab?: boolean): string
```
**Parameters:**
| Parameter | Type | Required | Description |
| --------- | --------- | -------- | -------------------------------- |
| `url` | `string` | Yes | URL to navigate to |
| `new_tab` | `boolean` | No | Open in new tab (default: false) |
**Returns:** Success message with URL
##### `browser_go_back`
Navigate back in browser history.
```typescript
browser_go_back(): string
```
**Returns:** "Navigated back"
#### Interaction Tools
##### `browser_click`
Click an element by index.
```typescript
browser_click(index: number, new_tab?: boolean): string
```
**Parameters:**
| Parameter | Type | Required | Description |
| --------- | --------- | -------- | ------------------------------------- |
| `index` | `number` | Yes | Element index from browser state |
| `new_tab` | `boolean` | No | Open link in new tab (default: false) |
**Returns:** Success message indicating click action
**Note:** When `new_tab` is true:
* For links: Extracts href and opens in new tab
* For other elements: Uses Cmd/Ctrl+Click
##### `browser_type`
Type text into an input field.
```typescript
browser_type(index: number, text: string): string
```
**Parameters:**
| Parameter | Type | Required | Description |
| --------- | -------- | -------- | -------------------------------- |
| `index` | `number` | Yes | Element index from browser state |
| `text` | `string` | Yes | Text to type |
**Returns:** Success message with typed text
##### `browser_scroll`
Scroll the page.
```typescript
browser_scroll(direction?: "up" | "down"): string
```
**Parameters:**
| Parameter | Type | Required | Description |
| ----------- | ---------------- | -------- | ---------------------------------- |
| `direction` | `"up" \| "down"` | No | Scroll direction (default: "down") |
**Returns:** "Scrolled {direction}"
#### State & Content Tools
##### `browser_get_state`
Get current browser state with all interactive elements.
```typescript
browser_get_state(include_screenshot?: boolean): string
```
**Parameters:**
| Parameter | Type | Required | Description |
| -------------------- | --------- | -------- | ------------------------------------------ |
| `include_screenshot` | `boolean` | No | Include base64 screenshot (default: false) |
**Returns:** JSON string containing:
```json
{
"url": "current page URL",
"title": "page title",
"tabs": [{"url": "...", "title": "..."}],
"interactive_elements": [
{
"index": 0,
"tag": "button",
"text": "element text (max 100 chars)",
"placeholder": "if present",
"href": "if link"
}
],
"screenshot": "base64 if requested"
}
```
The interactive elements include all clickable and interactive elements on the page, with their:
* `index`: Used to reference the element in other commands (click, type)
* `tag`: HTML tag name (button, input, a, etc.)
* `text`: Visible text content, truncated to 100 characters
* `placeholder`: For input fields (if present)
* `href`: For links (if present)
##### `browser_extract_content`
Extract structured content from the current page using AI.
```typescript
browser_extract_content(query: string, extract_links?: boolean): string
```
**Parameters:**
| Parameter | Type | Required | Description |
| --------------- | --------- | -------- | -------------------------------------------- |
| `query` | `string` | Yes | What to extract (e.g., "all product prices") |
| `extract_links` | `boolean` | No | Include links in extraction (default: false) |
**Returns:** Extracted content based on query
**Note:** Requires `OPENAI_API_KEY` environment variable for AI extraction.
#### Tab Management Tools
##### `browser_list_tabs`
List all open browser tabs.
```typescript
browser_list_tabs(): string
```
**Returns:** JSON array of tab information:
```json
[
{
"tab_id": 'AE21',
"url": "https://example.com",
"title": "Page Title"
}
]
```
##### `browser_switch_tab`
Switch to a specific tab.
```typescript
browser_switch_tab(tab_id: string): string
```
**Parameters:**
| Parameter | Type | Required | Description |
| --------- | -------- | -------- | ------------------------------------------------------ |
| `tab_id` | `string` | Yes | ID of tab to switch to (last 4 characters of TargetID) |
**Returns:** Success message with tab URL
##### `browser_close_tab`
Close a specific tab.
```typescript
browser_close_tab(tab_id: string): string
```
**Parameters:**
| Parameter | Type | Required | Description |
| --------- | -------- | -------- | ------------------------------------------------------ |
| `tab_id` | `string` | Yes | ID of the Tab to close (last 4 characters of TargetID) |
**Returns:** Success message with closed tab URL
### Tool Response Format
All tools return text content. Errors are returned as strings starting with "Error:".
## Configuration
### Environment Variables
Configure the MCP server behavior through environment variables in Claude Desktop config:
```json
{
"mcpServers": {
"browser-use": {
"command": "python",
"args": ["-m", "browser_use.mcp.server"],
"env": {
"OPENAI_API_KEY": "sk-..." // For AI content extraction
}
}
}
}
```
### Browser Profile Settings
The MCP server creates a browser session with these default settings:
* **Downloads Path**: `~/Downloads/browser-use-mcp/`
* **Wait Between Actions**: 0.5 seconds
* **Keep Alive**: True (browser stays open between commands)
* **Allowed Domains**: None by default (all domains allowed)
## Advanced Usage
### Running Standalone
Test the MCP server without Claude Desktop:
```bash
# Run server (reads from stdin, writes to stdout)
uvx 'browser-use[cli]' --mcp
# The server communicates via JSON-RPC on stdio
```
### Security Considerations
The MCP server provides full browser control to connected AI assistants. Consider these security measures:
1. **Domain Restrictions**: Currently not configurable via environment variables, but the server creates sessions with no domain restrictions by default
2. **File System Access**: The server creates a FileSystem instance at `~/.browser-use-mcp` for extraction operations
3. **Downloads**: Files download to `~/Downloads/browser-use-mcp/`
## Implementation Details
### Browser Session Management
* **Lazy Initialization**: Browser session is created on first browser tool use
* **Persistent Session**: Session remains active across multiple tool calls
* **Single Session**: Currently maintains one browser session per server instance
### Tool Categories
1. **Direct Browser Control**: Tools starting with `browser_` that directly interact with the browser
2. **Agent Tasks**: Currently commented out in implementation (`browser_use_run_task`)
### Error Handling
* All exceptions are caught and returned as text: `"Error: {message}"`
* Browser session initialization errors are returned to the client
* Missing dependencies (e.g., OPENAI\_API\_KEY) return descriptive error messages
## Troubleshooting
### Server Not Appearing in Claude
1. **Check configuration path:**
* macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
* Windows: `%APPDATA%\Claude\claude_desktop_config.json`
2. **Verify Python installation:**
```bash
uvx 'browser-use[cli]' --version
uvx 'browser-use[cli]' --mcp --help
```
3. **Check Claude logs:**
* macOS: `~/Library/Logs/Claude/mcp.log`
* Windows: `%APPDATA%\Claude\logs\mcp.log`
### Browser Not Launching
```bash
# Install Playwright browsers
playwright install chromium
# Test browser launch
python -c "from browser_use import Browser; import asyncio; asyncio.run(Browser().start())"
```
### Connection Errors
If you see "MCP server connection failed":
1. Test the server directly:
```bash
uvx 'browser-use[cli]' --mcp
```
2. Check all dependencies:
```bash
uv pip install "browser-use[cli]"
```
### Content Extraction Not Working
If `browser_extract_content` returns errors:
1. Ensure `OPENAI_API_KEY` is set in the environment configuration
2. Verify the API key is valid
3. Check that you have credits/access to the OpenAI API
## Limitations
| Limitation | Description | Workaround |
| ----------------------------- | --------------------------------------------- | -------------------------------- |
| Single Browser Session | One browser instance per server | Restart server for new session |
| No Domain Restrictions Config | Cannot configure allowed domains via env vars | Modify server code if needed |
| No Agent Mode | `browser_use_run_task` is commented out | Use direct browser control tools |
| Text-Only Responses | All responses are text strings | Parse JSON responses client-side |
## Comparison with MCP Client
| Feature | MCP Server (this) | [MCP Client](/customize/mcp-client) |
| ----------------- | ---------------------- | ----------------------------------- |
| **Purpose** | Expose browser to AI | Connect agent to tools |
| **User** | Claude Desktop, etc. | Browser-use agents |
| **Direction** | External → Browser | Agent → External |
| **Configuration** | JSON config file | Python code |
| **Tools** | Fixed browser tools | Dynamic from server |
| **Use Case** | Interactive assistance | Automated workflows |
## Code Examples
* [Simple MCP client example](https://github.com/browser-use/browser-use/tree/main/examples/mcp/simple_server.py) - Basic MCP client connecting to browser-use server
* [Advanced MCP client example](https://github.com/browser-use/browser-use/tree/main/examples/mcp/advanced_server.py) - Multi-server orchestration and complex workflows
## See Also
* [MCP Client](/customize/mcp-client) - Connect browser-use to external MCP servers
* [Model Context Protocol](https://modelcontextprotocol.io) - MCP specification
* [Claude Desktop](https://claude.ai/download) - Primary MCP client
# Add Tools
Source: https://docs.browser-use.com/customize/tools/add
Examples:
* deterministic clicks
* file handling
* calling APIs
* human-in-the-loop
* browser interactions
* calling LLMs
* get 2fa codes
* send emails
* ...
Simply add `@tools.action(...)` to your function.
```python
from browser_use import Tools, Agent
tools = Tools()
@tools.action(description='Ask human for help with a question')
def ask_human(question: str) -> ActionResult:
answer = input(f'{question} > ')
return f'The human responded with: {answer}'
```
```python
agent = Agent(task='...', llm=llm, tools=tools)
```
* **`description`** *(required)* - What the tool does, the LLM uses this to decide when to call it.
* **`allowed_domains`** - List of domains where tool can run (e.g. `['*.example.com']`), defaults to all domains
The Agent fills your function parameters based on their names, type hints, & defaults.
## Available Objects
Your function has access to these objects:
* **`browser_session: BrowserSession`** - Current browser session for CDP access
* **`cdp_client`** - Direct Chrome DevTools Protocol client
* **`page_extraction_llm: BaseChatModel`** - The LLM you pass into agent. This can be used to do a custom llm call here.
* **`file_system: FileSystem`** - File system access
* **`available_file_paths: list[str]`** - Available files for upload/processing
* **`has_sensitive_data: bool`** - Whether action contains sensitive data
## Pydantic Input
You can use Pydantic for the tool parameters:
```python
from pydantic import BaseModel
class Cars(BaseModel):
name: str = Field(description='The name of the car, e.g. "Toyota Camry"')
price: int = Field(description='The price of the car as int in USD, e.g. 25000')
@tools.action(description='Save cars to file')
def save_cars(cars: list[Cars]) -> str:
with open('cars.json', 'w') as f:
json.dump(cars, f)
return f'Saved {len(cars)} cars to file'
task = "find cars and save them to file"
```
## Domain Restrictions
Limit tools to specific domains:
```python
@tools.action(
description='Fill out banking forms',
allowed_domains=['https://mybank.com']
)
def fill_bank_form(account_number: str) -> str:
# Only works on mybank.com
return f'Filled form for account {account_number}'
```
# Available Tools
Source: https://docs.browser-use.com/customize/tools/available
Here is the [source code](https://github.com/browser-use/browser-use/blob/main/browser_use/tools/service.py) for the default tools:
### Navigation & Browser Control
* **`search_google`** - Search queries in Google
* **`go_to_url`** - Navigate to URLs
* **`go_back`** - Go back in browser history
* **`wait`** - Wait for specified seconds
### Page Interaction
* **`click_element_by_index`** - Click elements by their index
* **`input_text`** - Input text into form fields
* **`upload_file_to_element`** - Upload files to file inputs
* **`scroll`** - Scroll the page up/down
* **`scroll_to_text`** - Scroll to specific text on page
* **`send_keys`** - Send special keys (Enter, Escape, etc.)
### Tab Management
* **`switch_tab`** - Switch between browser tabs
* **`close_tab`** - Close browser tabs
### Content Extraction
* **`extract_structured_data`** - Extract data from webpages using LLM
### Form Controls
* **`get_dropdown_options`** - Get dropdown option values
* **`select_dropdown_option`** - Select dropdown options
### File Operations
* **`write_file`** - Write content to files
* **`read_file`** - Read file contents
* **`replace_file_str`** - Replace text in files
### Task Completion
* **`done`** - Complete the task (always available)
# Basics
Source: https://docs.browser-use.com/customize/tools/basics
Tools are the functions that the agent has to interact with the world.
## Quick Example
```python
from browser_use import Tools, ActionResult
tools = Tools()
@tools.action('Ask human for help with a question')
def ask_human(question: str) -> ActionResult:
answer = input(f'{question} > ')
return f'The human responded with: {answer}'
agent = Agent(
task='Ask human for help',
llm=llm,
tools=tools,
)
```
# Remove Tools
Source: https://docs.browser-use.com/customize/tools/remove
You can exclude default tools:
```python
from browser_use import Tools
tools = Tools(exclude_actions=['search_google', 'wait'])
agent = Agent(task='...', llm=llm, tools=tools)
```
# Tool Response
Source: https://docs.browser-use.com/customize/tools/response
Tools return results using `ActionResult` or simple strings.
## Return Types
```python
@tools.action('My tool')
def my_tool() -> str:
return "Task completed successfully"
@tools.action('Advanced tool')
def advanced_tool() -> ActionResult:
return ActionResult(
extracted_content="Main result",
long_term_memory="Remember this info",
error="Something went wrong",
is_done=True,
success=True,
attachments=["file.pdf"],
)
```
## ActionResult Properties
* `extracted_content` (default: `None`) - Main result passed to LLM, this is equivalent to returning a string.
* `include_extracted_content_only_once` (default: `False`) - Set to `True` for large content to include it only once in the LLM input.
* `long_term_memory` (default: `None`) - This is always included in the LLM input for all future steps.
* `error` (default: `None`) - Error message, we catch exceptions and set this automatically. This is always included in the LLM input.
* `is_done` (default: `False`) - Tool completes entire task
* `success` (default: `None`) - Task success (only valid with `is_done=True`)
* `attachments` (default: `None`) - Files to show user
* `metadata` (default: `None`) - Debug/observability data
## Why `extracted_content` and `long_term_memory`?
With this you control the context for the LLM.
### 1. Include short content always in context
```python
def simple_tool() -> str:
return "Hello, world!" # Keep in context for all future steps
```
### 2. Show long content once, remember subset in context
```python
return ActionResult(
extracted_content="[500 lines of product data...]", # Shows to LLM once
include_extracted_content_only_once=True, # Never show full output again
long_term_memory="Found 50 products" # Only this in future steps
)
```
We save the full `extracted_content` to files which the LLM can read in future steps.
### 3. Dont show long content, remember subset in context
```python
return ActionResult(
extracted_content="[500 lines of product data...]", # The LLM never sees this because `long_term_memory` overrides it and `include_extracted_content_only_once` is not used
long_term_memory="Saved user's favorite products", # This is shown to the LLM in future steps
)
```
## Terminating the Agent
Set `is_done=True` to stop the agent completely. Use when your tool finishes the entire task:
```python
@tools.action(description='Complete the task')
def finish_task() -> ActionResult:
return ActionResult(
extracted_content="Task completed!",
is_done=True, # Stops the agent
success=True # Task succeeded
)
```
# Contribution Guide
Source: https://docs.browser-use.com/development/contribution-guide
Learn how to contribute to Browser Use
# Join the Browser Use Community!
We're thrilled you're interested in contributing to Browser Use! This guide will help you get started with contributing to our project. Your contributions are what make the open-source community such an amazing place to learn, inspire, and create.
## Quick Setup
Get started with Browser Use development in minutes:
```bash
git clone https://github.com/browser-use/browser-use
cd browser-use
uv sync --all-extras --dev
# or pip install -U git+https://github.com/browser-use/browser-use.git@main
echo "BROWSER_USE_LOGGING_LEVEL=debug" >> .env
```
For more detailed setup instructions, see our [Local Setup Guide](/development/local-setup).
## How to Contribute
### Find Something to Work On
* Browse our [GitHub Issues](https://github.com/browser-use/browser-use/issues) for beginner-friendly issues labeled `good-first-issue`
* Check out our most active issues or ask in [Discord](https://discord.gg/zXJJHtJf3k) for ideas of what to work on
* Get inspiration and share what you build in the [`#showcase-your-work`](https://discord.com/channels/1303749220842340412/1305549200678850642) channel
* Explore or contribute to [`awesome-browser-use-prompts`](https://github.com/browser-use/awesome-prompts)!
### Making a Great Pull Request
When submitting a pull request, please:
* Include a clear description of what the PR does and why it's needed
* Add tests that cover your changes
* Include a demo screenshot/gif or an example script demonstrating your changes
* Make sure the PR passes all CI checks and tests
* Keep your PR focused on a single issue or feature to make it easier to review
Note: We appreciate quality over quantity. Instead of submitting small typo/style-only PRs, consider including those fixes as part of larger bugfix or feature PRs.
### Contribution Process
1. Fork the repository
2. Create a new branch for your feature or bugfix
3. Make your changes
4. Run tests to ensure everything works
5. Submit a pull request
6. Respond to any feedback from maintainers
7. Celebrate your contribution!
Feel free to bump your issues/PRs with comments periodically if you need faster feedback.
## Code of Conduct
We're committed to providing a welcoming and inclusive environment for all contributors. Please be respectful and constructive in all interactions.
## Getting Help
If you need help at any point:
* Join our [Discord community](https://link.browser-use.com/discord)
* Ask questions in the appropriate GitHub issue
* Check our [documentation](/introduction)
We're here to help you succeed in contributing to Browser Use!
# Local Setup
Source: https://docs.browser-use.com/development/local-setup
Set up Browser Use development environment locally
# Welcome to Browser Use Development!
We're excited to have you join our community of contributors. This guide will help you set up your local development environment quickly and easily.
## Quick Setup
If you're familiar with Python development, here's the quick way to get started:
```bash
git clone https://github.com/browser-use/browser-use
cd browser-use
uv sync --all-extras --dev
# or pip install -U git+https://github.com/browser-use/browser-use.git@main
echo "BROWSER_USE_LOGGING_LEVEL=debug" >> .env
```
## Helper Scripts
We provide several convenient shell scripts in the `bin/` directory to help with common development tasks:
```bash
# Complete setup script - installs uv, creates a venv, and installs dependencies
./bin/setup.sh
# Run all pre-commit hooks (formatting, linting, type checking)
./bin/lint.sh
# Run the core test suite that's executed in CI
./bin/test.sh
```
## Prerequisites
Browser Use requires Python 3.11 or higher. We recommend using [uv](https://docs.astral.sh/uv/) for Python environment management.
## Detailed Setup Instructions
### Clone the Repository
First, clone the Browser Use repository:
```bash
git clone https://github.com/browser-use/browser-use
cd browser-use
```
### Environment Setup
1. Create and activate a virtual environment:
```bash
uv venv --python 3.11
source .venv/bin/activate
```
2. Install dependencies:
```bash
# Install the package in editable mode with all development dependencies
uv sync --all-extras
# Install the default browser
playwright install chromium --with-deps --no-shell
```
## Configuration
Set up your environment variables:
```bash
# Copy the example environment file
cp .env.example .env
```
Or manually create a `.env` file with the API key for the models you want to use set:
```bash .env
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=
AZURE_ENDPOINT=
AZURE_OPENAI_API_KEY=
GOOGLE_API_KEY=
DEEPSEEK_API_KEY=
GROK_API_KEY=
NOVITA_API_KEY=
BROWSER_USE_LOGGING_LEVEL=debug # Helpful for development
```
See [Supported Models](/customize/supported-models) for available LLM options
and their specific API key requirements.
## Development
After setup, you can:
* Try demos in the example library with `uv run examples/simple.py`
* Run the linter/formatter with `uv run ruff format examples/some/file.py`
* Run tests with `uv run pytest`
* Build the package with `uv build`
### Linting
```bash
# Run the linter on the whole project (must pass for PR to be allowed to merge)
uv run pre-commit run --all-files
# or use our convenience script
./bin/lint.sh
# Install the linter & formatter pre-commit hooks to run automatically
pre-commit install --install-hooks
# Experimental: run the type checker
uv run type
```
### Tests
```bash
# Run all tests that run in CI
./bin/test.sh
# Run specific tests
uv run pytest # run everything
uv run pytest tests/test_tools.py # run a specific test file
uv run pytest tests/test_sensitive_data.py tests/test_tab_management.py # run two test files
uv run pytest tests/test_tab_management.py::TestTabManagement::test_user_changes_tab # run a single test
```
### Build
```bash
uv build
uv pip install dist/*.whl
# push build to PyPI (automatically run by Github Actions CI)
uv publish
```
## Getting Help
If you run into any issues:
1. Check our [GitHub Issues](https://github.com/browser-use/browser-use/issues)
2. Join our [Discord community](https://link.browser-use.com/discord) for support
We welcome contributions! See our [Contribution
Guide](/development/contribution-guide) for guidelines on how to help improve
Browser Use.
# Observability
Source: https://docs.browser-use.com/development/observability
Trace Browser Use's agent execution steps and browser sessions
## Overview
Browser Use has a native integration with [Laminar](https://lmnr.ai) - open-source platform for tracing, evals and labeling of AI agents.
Read more about Laminar in the [Laminar docs](https://docs.lmnr.ai).
## Setup
Register on [Laminar Cloud](https://lmnr.ai) and get the key from your project settings.
Set the `LMNR_PROJECT_API_KEY` environment variable.
```bash
pip install 'lmnr[all]'
export LMNR_PROJECT_API_KEY=
```
## Usage
Then, you simply initialize the Laminar at the top of your project and both Browser Use and session recordings will be automatically traced.
```python {5-8}
from browser_use import Agent, ChatOpenAI
import asyncio
from lmnr import Laminar, Instruments
# this line auto-instruments Browser Use and any browser you use (local or remote)
Laminar.initialize(project_api_key="...")
async def main():
agent = Agent(
task="open google, search Laminar AI",
llm=ChatOpenAI(model="gpt-4.1-mini"),
)
await agent.run()
asyncio.run(main())
```
## Viewing Traces
You can view traces in the Laminar UI by going to the traces tab in your project.
When you select a trace, you can see both the browser session recording and the agent execution steps.
Timeline of the browser session is synced with the agent execution steps, timeline highlights indicate the agent's current step synced with the browser session.
In the trace view, you can also see the agent's current step, the tool it's using, and the tool's input and output. Tools are highlighted in the timeline with a yellow color.
## Laminar
To learn more about tracing and evaluating your browser agents, check out the [Laminar docs](https://docs.lmnr.ai).
# Telemetry
Source: https://docs.browser-use.com/development/telemetry
Understanding Browser Use's telemetry and privacy settings
## Overview
Browser Use collects anonymous usage data to help us understand how the library is being used and to improve the user experience. It also helps us fix bugs faster and prioritize feature development.
## Data Collection
We use [PostHog](https://posthog.com) for telemetry collection. The data is completely anonymized and contains no personally identifiable information.
We never collect personal information, credentials, or specific content from
your browser automation tasks.
## Opting Out
You can disable telemetry by setting an environment variable:
```bash .env
ANONYMIZED_TELEMETRY=false
```
Or in your Python code:
```python
import os
os.environ["ANONYMIZED_TELEMETRY"] = "false"
```
Even when enabled, telemetry has zero impact on the library's performance or
functionality. Code is available in [Telemetry
Service](https://github.com/browser-use/browser-use/tree/main/browser_use/telemetry).
# Introduction
Source: https://docs.browser-use.com/introduction
Automate browser tasks in plain text.
Open-source Python library.
Scale up with our cloud.
# Human Quickstart
Source: https://docs.browser-use.com/quickstart
## 1. Easy setup
Use [uv](https://docs.astral.sh/uv/) to create and activate the environment:
```bash
uv venv --python 3.12
```
```bash
# For Mac/Linux:
source .venv/bin/activate
# For Windows:
.venv\Scripts\activate
```
Install browser-use:
```bash
uv pip install browser-use
```
Install Chromium:
```bash
uvx playwright install chromium --with-deps
```
## 2. Choose your favorite LLM
Create a `.env` file and add your API key:
```bash .env
OPENAI_API_KEY=
```
See [Supported Models](/customize/supported-models) for other models.
## 3. Run your first agent
```python agent.py
from browser_use import Agent, ChatOpenAI
from dotenv import load_dotenv
import asyncio
load_dotenv()
async def main():
llm = ChatOpenAI(model="gpt-4.1-mini")
task = "Find the number 1 post on Show HN"
agent = Agent(task=task, llm=llm)
await agent.run()
if __name__ == "__main__":
asyncio.run(main())
```
# LLM Quickstart
Source: https://docs.browser-use.com/quickstart_llm
1. Copy all content [🔗 from here](https://docs.browser-use.com/llms-full.txt) (\~40k tokens)
2. Paste it into your favorite coding agent (Cursor, Claude, ChatGPT ...).