# All Parameters Source: https://docs.browser-use.com/customize/actor/all-parameters Complete API reference for Browser Actor classes, methods, and parameters including BrowserSession, Page, Element, and Mouse ## Browser (BrowserSession) Main browser session manager. ### Key Methods ```python theme={null} from browser_use import Browser browser = Browser() await browser.start() # Page management page = await browser.new_page("https://example.com") pages = await browser.get_pages() current = await browser.get_current_page() await browser.close_page(page) # To stop the browser session await browser.stop() ``` ### Constructor Parameters See [Browser Parameters](../browser/all-parameters) for complete configuration options. ## Page Browser tab/iframe for page-level operations. ### Navigation * `goto(url: str)` - Navigate to URL * `go_back()`, `go_forward()`, `reload()` - History navigation ### Element Finding * `get_elements_by_css_selector(selector: str) -> list[Element]` - CSS selector * `get_element(backend_node_id: int) -> Element` - By CDP node ID * `get_element_by_prompt(prompt: str, llm) -> Element | None` - AI-powered * `must_get_element_by_prompt(prompt: str, llm) -> Element` - AI (raises if not found) ### JavaScript & Controls * `evaluate(page_function: str, *args) -> str` - Execute JS (arrow function format) * `press(key: str)` - Send keyboard input ("Enter", "Control+A") * `set_viewport_size(width: int, height: int)` - Set viewport * `screenshot(format='jpeg', quality=None) -> str` - Take screenshot ### Information * `get_url() -> str`, `get_title() -> str` - Page info * `mouse -> Mouse` - Get mouse interface ### AI Features * `extract_content(prompt: str, structured_output: type[T], llm) -> T` - Extract data ## Element Individual DOM element interactions. ### Interactions * `click(button='left', click_count=1, modifiers=None)` - Click element * `fill(text: str, clear=True)` - Fill input * `hover()`, `focus()` - Mouse/focus actions * `check()` - Toggle checkbox/radio * `select_option(values: str | list[str])` - Select dropdown options * `drag_to(target: Element | Position)` - Drag and drop ### Properties * `get_attribute(name: str) -> str | None` - Get attribute * `get_bounding_box() -> BoundingBox | None` - Position/size * `get_basic_info() -> ElementInfo` - Complete element info * `screenshot(format='jpeg') -> str` - Element screenshot ## Mouse Coordinate-based mouse operations. ### Operations * `click(x: int, y: int, button='left', click_count=1)` - Click at coordinates * `move(x: int, y: int, steps=1)` - Move mouse * `down(button='left')`, `up(button='left')` - Press/release buttons * `scroll(x=0, y=0, delta_x=None, delta_y=None)` - Scroll at coordinates # Basics Source: https://docs.browser-use.com/customize/actor/basics Low-level Playwright-like browser automation with direct and full CDP control and precise element interactions ## Core Architecture ```mermaid theme={null} graph TD A[Browser] --> B[Page] B --> C[Element] B --> D[Mouse] B --> E[AI Features] C --> F[DOM Interactions] D --> G[Coordinate Operations] E --> H[LLM Integration] ``` ### Core Classes * **Browser** (alias: **BrowserSession**): Main session manager * **Page**: Represents a browser tab/iframe * **Element**: Individual DOM element operations * **Mouse**: Coordinate-based mouse operations ## Basic Usage ```python theme={null} from browser_use import Browser, Agent from browser_use.llm.openai.chat import ChatOpenAI async def main(): llm = ChatOpenAI(api_key="your-api-key") browser = Browser() await browser.start() # 1. Actor: Precise navigation and element interactions page = await browser.new_page("https://github.com/login") email_input = await page.must_get_element_by_prompt("username field", llm=llm) await email_input.fill("your-username") # 2. Agent: AI-driven complex tasks agent = Agent(browser=browser, llm=llm) await agent.run("Complete login and navigate to my repositories") await browser.stop() ``` ## Important Notes * **Not Playwright**: Actor is built on CDP, not Playwright. The API resembles Playwright as much as possible for easy migration, but is sorta subset. * **Immediate Returns**: `get_elements_by_css_selector()` doesn't wait for visibility * **Manual Timing**: You handle navigation timing and waiting * **JavaScript Format**: `evaluate()` requires arrow function format: `() => {}` # Examples Source: https://docs.browser-use.com/customize/actor/examples Comprehensive examples for Browser Actor automation tasks including forms, JavaScript, mouse operations, and AI features ## Page Management ```python theme={null} from browser_use import Browser browser = Browser() await browser.start() # Create pages page = await browser.new_page() # Blank tab page = await browser.new_page("https://example.com") # With URL # Get all pages pages = await browser.get_pages() current = await browser.get_current_page() # Close page await browser.close_page(page) await browser.stop() ``` ## Element Finding & Interactions ```python theme={null} page = await browser.new_page('https://github.com') # CSS selectors (immediate return) elements = await page.get_elements_by_css_selector("input[type='text']") buttons = await page.get_elements_by_css_selector("button.submit") # Element actions await elements[0].click() await elements[0].fill("Hello World") await elements[0].hover() # Page actions await page.press("Enter") screenshot = await page.screenshot() ``` ## LLM-Powered Features ```python theme={null} from browser_use.llm.openai.chat import ChatOpenAI from pydantic import BaseModel llm = ChatOpenAI(api_key="your-api-key") # Find elements using natural language button = await page.get_element_by_prompt("login button", llm=llm) await button.click() # Extract structured data class ProductInfo(BaseModel): name: str price: float product = await page.extract_content( "Extract product name and price", ProductInfo, llm=llm ) ``` ## JavaScript Execution ```python theme={null} # Simple JavaScript evaluation title = await page.evaluate('() => document.title') # JavaScript with arguments result = await page.evaluate('(x, y) => x + y', 10, 20) # Complex operations stats = await page.evaluate('''() => ({ url: location.href, links: document.querySelectorAll('a').length })''') ``` ## Mouse Operations ```python theme={null} mouse = await page.mouse # Click at coordinates await mouse.click(x=100, y=200) # Drag and drop await mouse.down() await mouse.move(x=500, y=600) await mouse.up() # Scroll await mouse.scroll(x=0, y=100, delta_y=-500) ``` ## Best Practices * Use `asyncio.sleep()` after actions that trigger navigation * Check URL/title changes to verify state transitions * Always check if elements exist before interaction * Implement retry logic for flaky elements * Call `browser.stop()` to clean up resources # All Parameters Source: https://docs.browser-use.com/customize/agent/all-parameters Complete reference for all agent configuration options ## Available Parameters ### Core Settings * `tools`: Registry of tools the agent can call. Example * `skills` (or `skill_ids`): List of skill IDs to load (e.g., `['skill-uuid']` or `['*']` for all). Requires `BROWSER_USE_API_KEY`. Docs * `browser`: Browser object where you can specify the browser settings. * `output_model_schema`: Pydantic model class for structured output validation. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/custom_output.py) ### Vision & Processing * `use_vision` (default: `"auto"`): Vision mode - `"auto"` includes screenshot tool but only uses vision when requested, `True` always includes screenshots, `False` never includes screenshots and excludes screenshot tool * `vision_detail_level` (default: `'auto'`): Screenshot detail level - `'low'`, `'high'`, or `'auto'` * `page_extraction_llm`: Separate LLM model for page content extraction. You can choose a small & fast model because it only needs to extract text from the page (default: same as `llm`) ### Fallback & Resilience * `fallback_llm`: Backup LLM to use when the primary LLM fails. The primary LLM will first exhaust its own retry logic (typically 5 attempts with exponential backoff), and only then switch to the fallback. Triggers on rate limits (429), authentication errors (401), payment/credit errors (402), or server errors (500, 502, 503, 504). Once switched, the fallback is used for the rest of the run. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/fallback_model.py) ### Actions & Behavior * `initial_actions`: List of actions to run before the main task without LLM. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/initial_actions.py) * `max_actions_per_step` (default: `4`): Maximum actions per step, e.g. for form filling the agent can output 4 fields at once. We execute the actions until the page changes. * `max_failures` (default: `3`): Maximum retries for steps with errors * `final_response_after_failure` (default: `True`): If True, attempt to force one final model call with intermediate output after max\_failures is reached * `use_thinking` (default: `True`): Controls whether the agent uses its internal "thinking" field for explicit reasoning steps. * `flash_mode` (default: `False`): Fast mode that skips evaluation, next goal and thinking and only uses memory. If `flash_mode` is enabled, it overrides `use_thinking` and disables the thinking process entirely. [Example](https://github.com/browser-use/browser-use/blob/main/examples/getting_started/05_fast_agent.py) ### System Messages * `override_system_message`: Completely replace the default system prompt. * `extend_system_message`: Add additional instructions to the default system prompt. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/custom_system_prompt.py) ### File & Data Management * `save_conversation_path`: Path to save complete conversation history * `save_conversation_path_encoding` (default: `'utf-8'`): Encoding for saved conversations * `available_file_paths`: List of file paths the agent can access * `sensitive_data`: Dictionary of sensitive data to handle carefully. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/sensitive_data.py) ### Visual Output * `generate_gif` (default: `False`): Generate GIF of agent actions. Set to `True` or string path * `include_attributes`: List of HTML attributes to include in page analysis ### Performance & Limits * `max_history_items`: Maximum number of last steps to keep in the LLM memory. If `None`, we keep all steps. * `llm_timeout` (default: `90`): Timeout in seconds for LLM calls * `step_timeout` (default: `120`): Timeout in seconds for each step * `directly_open_url` (default: `True`): If we detect a url in the task, we directly open it. ### Advanced Options * `calculate_cost` (default: `False`): Calculate and track API costs * `display_files_in_done_text` (default: `True`): Show file information in completion messages ### Backwards Compatibility * `controller`: Alias for `tools` for backwards compatibility. * `browser_session`: Alias for `browser` for backwards compatibility. # Basics Source: https://docs.browser-use.com/customize/agent/basics ```python theme={null} from browser_use import Agent, ChatBrowserUse agent = Agent( task="Search for latest news about AI", llm=ChatBrowserUse(), ) async def main(): history = await agent.run(max_steps=100) ``` * `task`: The task you want to automate. * `llm`: Your favorite LLM. See Supported Models. The agent is executed using the async `run()` method: * `max_steps` (default: `100`): Maximum number of steps an agent can take. Check out all customizable parameters here. # Output Format Source: https://docs.browser-use.com/customize/agent/output-format ## Agent History The `run()` method returns an `AgentHistoryList` object with the complete execution history: ```python theme={null} history = await agent.run() # Access useful information history.urls() # List of visited URLs history.screenshot_paths() # List of screenshot paths history.screenshots() # List of screenshots as base64 strings history.action_names() # Names of executed actions history.extracted_content() # List of extracted content from all actions history.errors() # List of errors (with None for steps without errors) history.model_actions() # All actions with their parameters history.model_outputs() # All model outputs from history history.last_action() # Last action in history # Analysis methods history.final_result() # Get the final extracted content (last step) history.is_done() # Check if agent completed successfully history.is_successful() # Check if agent completed successfully (returns None if not done) history.has_errors() # Check if any errors occurred history.model_thoughts() # Get the agent's reasoning process (AgentBrain objects) history.action_results() # Get all ActionResult objects from history history.action_history() # Get truncated action history with essential fields history.number_of_steps() # Get the number of steps in the history history.total_duration_seconds() # Get total duration of all steps in seconds # Structured output (when using output_model_schema) history.structured_output # Property that returns parsed structured output ``` See all helper methods in the [AgentHistoryList source code](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py#L301). ## Structured Output For structured output, use the `output_model_schema` parameter with a Pydantic model. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/custom_output.py). # Prompting Guide Source: https://docs.browser-use.com/customize/agent/prompting-guide Tips and tricks Prompting can drastically improve performance and solve existing limitations of the library. ### 1. Be Specific vs Open-Ended **✅ Specific (Recommended)** ```python theme={null} task = """ 1. Go to https://quotes.toscrape.com/ 2. Use extract action with the query "first 3 quotes with their authors" 3. Save results to quotes.csv using write_file action 4. Do a google search for the first quote and find when it was written """ ``` **❌ Open-Ended** ```python theme={null} task = "Go to web and make money" ``` ### 2. Name Actions Directly When you know exactly what the agent should do, reference actions by name: ```python theme={null} task = """ 1. Use search action to find "Python tutorials" 2. Use click to open first result in a new tab 3. Use scroll action to scroll down 2 pages 4. Use extract to extract the names of the first 5 items 5. Wait for 2 seconds if the page is not loaded, refresh it and wait 10 sec 6. Use send_keys action with "Tab Tab ArrowDown Enter" """ ``` See [Available Tools](/customize/tools/available) for the complete list of actions. ### 3. Handle interaction problems via keyboard navigation Sometimes buttons can't be clicked (you found a bug in the library - open an issue). Good news - often you can work around it with keyboard navigation! ```python theme={null} task = """ If the submit button cannot be clicked: 1. Use send_keys action with "Tab Tab Enter" to navigate and activate 2. Or use send_keys with "ArrowDown ArrowDown Enter" for form submission """ ``` ### 4. Custom Actions Integration ```python theme={null} # When you have custom actions @controller.action("Get 2FA code from authenticator app") async def get_2fa_code(): # Your implementation pass task = """ Login with 2FA: 1. Enter username/password 2. When prompted for 2FA, use get_2fa_code action 3. NEVER try to extract 2FA codes from the page manually 4. ALWAYS use the get_2fa_code action for authentication codes """ ``` ### 5. Error Recovery ```python theme={null} task = """ Robust data extraction: 1. Go to openai.com to find their CEO 2. If navigation fails due to anti-bot protection: - Use google search to find the CEO 3. If page times out, use go_back and try alternative approach """ ``` The key to effective prompting is being specific about actions. # All Parameters Source: https://docs.browser-use.com/customize/browser/all-parameters Complete reference for all browser configuration options The `Browser` instance also provides all [Actor](/customize/actor/all-parameters) methods for direct browser control (page management, element interactions, etc.). ## Core Settings * `cdp_url`: CDP URL for connecting to existing browser instance (e.g., `"http://localhost:9222"`) ## Display & Appearance * `headless` (default: `None`): Run browser without UI. Auto-detects based on display availability (`True`/`False`/`None`) * `window_size`: Browser window size for headful mode. Use dict `{'width': 1920, 'height': 1080}` or `ViewportSize` object * `window_position` (default: `{'width': 0, 'height': 0}`): Window position from top-left corner in pixels * `viewport`: Content area size, same format as `window_size`. Use `{'width': 1280, 'height': 720}` or `ViewportSize` object * `no_viewport` (default: `None`): Disable viewport emulation, content fits to window size * `device_scale_factor`: Device scale factor (DPI). Set to `2.0` or `3.0` for high-resolution screenshots ## Browser Behavior * `keep_alive` (default: `None`): Keep browser running after agent completes * `allowed_domains`: Restrict navigation to specific domains. Domain pattern formats: * `'example.com'` - Matches only `https://example.com/*` * `'*.example.com'` - Matches `https://example.com/*` and any subdomain `https://*.example.com/*` * `'http*://example.com'` - Matches both `http://` and `https://` protocols * `'chrome-extension://*'` - Matches any Chrome extension URL * **Security**: Wildcards in TLD (e.g., `example.*`) are **not allowed** for security * Use list like `['*.google.com', 'https://example.com', 'chrome-extension://*']` * **Performance**: Lists with 100+ domains are automatically optimized to sets for O(1) lookup. Pattern matching is disabled for optimized lists. Both `www.example.com` and `example.com` variants are checked automatically. * `prohibited_domains`: Block navigation to specific domains. Uses same pattern formats as `allowed_domains`. When both `allowed_domains` and `prohibited_domains` are set, `allowed_domains` takes precedence. Examples: * `['pornhub.com', '*.gambling-site.net']` - Block specific sites and all subdomains * `['https://explicit-content.org']` - Block specific protocol/domain combination * **Performance**: Lists with 100+ domains are automatically optimized to sets for O(1) lookup (same as `allowed_domains`) * `enable_default_extensions` (default: `True`): Load automation extensions (uBlock Origin, cookie handlers, ClearURLs) * `cross_origin_iframes` (default: `False`): Enable cross-origin iframe support (may cause complexity) * `is_local` (default: `True`): Whether this is a local browser instance. Set to `False` for remote browsers. If we have a `executable_path` set, it will be automatically set to `True`. This can effect your download behavior. ## User Data & Profiles * `user_data_dir` (default: auto-generated temp): Directory for browser profile data. Use `None` for incognito mode * `profile_directory` (default: `'Default'`): Chrome profile subdirectory name (`'Profile 1'`, `'Work Profile'`, etc.) * `storage_state`: Browser storage state (cookies, localStorage). Can be file path string or dict object ## Network & Security * `proxy`: Proxy configuration using `ProxySettings(server='http://host:8080', bypass='localhost,127.0.0.1', username='user', password='pass')` * `permissions` (default: `['clipboardReadWrite', 'notifications']`): Browser permissions to grant. Use list like `['camera', 'microphone', 'geolocation']` * `headers`: Additional HTTP headers for connect requests (remote browsers only) ## Browser Launch * `executable_path`: Path to browser executable for custom installations. Platform examples: * macOS: `'/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'` * Windows: `'C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'` * Linux: `'/usr/bin/google-chrome'` * `channel`: Browser channel (`'chromium'`, `'chrome'`, `'chrome-beta'`, `'msedge'`, etc.) * `args`: Additional command-line arguments for the browser. Use list format: `['--disable-gpu', '--custom-flag=value', '--another-flag']` * `env`: Environment variables for browser process. Use dict like `{'DISPLAY': ':0', 'LANG': 'en_US.UTF-8', 'CUSTOM_VAR': 'test'}` * `chromium_sandbox` (default: `True` except in Docker): Enable Chromium sandboxing for security * `devtools` (default: `False`): Open DevTools panel automatically (requires `headless=False`) * `ignore_default_args`: List of default args to disable, or `True` to disable all. Use list like `['--enable-automation', '--disable-extensions']` ## Timing & Performance * `minimum_wait_page_load_time` (default: `0.25`): Minimum time to wait before capturing page state in seconds * `wait_for_network_idle_page_load_time` (default: `0.5`): Time to wait for network activity to cease in seconds * `wait_between_actions` (default: `0.5`): Time to wait between agent actions in seconds ## AI Integration * `highlight_elements` (default: `True`): Highlight interactive elements for AI vision * `paint_order_filtering` (default: `True`): Enable paint order filtering to optimize DOM tree by removing elements hidden behind others. Slightly experimental ## Downloads & Files * `accept_downloads` (default: `True`): Automatically accept all downloads * `downloads_path`: Directory for downloaded files. Use string like `'./downloads'` or `Path` object * `auto_download_pdfs` (default: `True`): Automatically download PDFs instead of viewing in browser ## Device Emulation * `user_agent`: Custom user agent string. Example: `'Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X)'` * `screen`: Screen size information, same format as `window_size` ## Recording & Debugging Video recording requires additional optional dependencies. If these are not installed, no video will be saved and no error will be raised. Install with: ```bash theme={null} pip install "browser-use[video]" ``` or: ```bash theme={null} pip install imageio[ffmpeg] numpy ``` * `record_video_dir`: Directory to save video recordings as `.mp4` files * `record_video_size` (default: `ViewportSize`): The frame size (width, height) of the video recording. * `record_video_framerate` (default: `30`): The framerate to use for the video recording. * `record_har_path`: Path to save network trace files as `.har` format * `traces_dir`: Directory to save complete trace files for debugging * `record_har_content` (default: `'embed'`): HAR content mode (`'omit'`, `'embed'`, `'attach'`) * `record_har_mode` (default: `'full'`): HAR recording mode (`'full'`, `'minimal'`) ## Advanced Options * `disable_security` (default: `False`): ⚠️ **NOT RECOMMENDED** - Disables all browser security features * `deterministic_rendering` (default: `False`): ⚠️ **NOT RECOMMENDED** - Forces consistent rendering but reduces performance *** ## Outdated BrowserProfile For backward compatibility, you can pass all the parameters from above to the `BrowserProfile` and then to the `Browser`. ```python theme={null} from browser_use import BrowserProfile profile = BrowserProfile(headless=False) browser = Browser(browser_profile=profile) ``` ## Browser vs BrowserSession `Browser` is an alias for `BrowserSession` - they are exactly the same class: Use `Browser` for cleaner, more intuitive code. # Basics Source: https://docs.browser-use.com/customize/browser/basics *** ```python theme={null} from browser_use import Agent, Browser, ChatBrowserUse browser = Browser( headless=False, # Show browser window window_size={'width': 1000, 'height': 700}, # Set window size ) agent = Agent( task='Search for Browser Use', browser=browser, llm=ChatBrowserUse(), ) async def main(): await agent.run() ``` # Real Browser Source: https://docs.browser-use.com/customize/browser/real-browser Connect your existing Chrome browser to preserve authentication. ## Basic Example ```python theme={null} import asyncio from browser_use import Agent, Browser, ChatOpenAI # Connect to your existing Chrome browser browser = Browser( executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome', user_data_dir='~/Library/Application Support/Google/Chrome', profile_directory='Default', ) agent = Agent( task='Visit https://duckduckgo.com and search for "browser-use founders"', browser=browser, llm=ChatOpenAI(model='gpt-4.1-mini'), ) async def main(): await agent.run() if __name__ == "__main__": asyncio.run(main()) ``` > **Note:** You need to fully close chrome before running this example. Also, Google blocks this approach currently so we use DuckDuckGo instead. ## How it Works 1. **`executable_path`** - Path to your Chrome installation 2. **`user_data_dir`** - Your Chrome profile folder (keeps cookies, extensions, bookmarks) 3. **`profile_directory`** - Specific profile name (Default, Profile 1, etc.) ## Platform Paths ```python theme={null} # macOS executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome' user_data_dir='~/Library/Application Support/Google/Chrome' # Windows executable_path='C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe' user_data_dir='%LOCALAPPDATA%\\Google\\Chrome\\User Data' # Linux executable_path='/usr/bin/google-chrome' user_data_dir='~/.config/google-chrome' ``` # Remote Browser Source: https://docs.browser-use.com/customize/browser/remote ### Browser-Use Cloud Browser or CDP URL The easiest way to use a cloud browser is with the built-in Browser-Use cloud service: ```python theme={null} from browser_use import Agent, Browser, ChatBrowserUse # Simple: Use Browser-Use cloud browser service browser = Browser( use_cloud=True, # Automatically provisions a cloud browser ) # Advanced: Configure cloud browser parameters # Using this settings can bypass any captcha protection on any website browser = Browser( cloud_profile_id='your-profile-id', # Optional: specific browser profile cloud_proxy_country_code='us', # Optional: proxy location (us, uk, fr, it, jp, au, de, fi, ca, in) cloud_timeout=30, # Optional: session timeout in minutes (MAX free: 15min, paid: 240min) ) # Or use a CDP URL from any cloud browser provider browser = Browser( cdp_url="http://remote-server:9222" # Get a CDP URL from any provider ) agent = Agent( task="Your task here", llm=ChatBrowserUse(), browser=browser, ) ``` **Prerequisites:** 1. Get an API key from [cloud.browser-use.com](https://cloud.browser-use.com/new-api-key) 2. Set BROWSER\_USE\_API\_KEY environment variable **Cloud Browser Parameters:** * `cloud_profile_id`: UUID of a browser profile (optional, uses default if not specified) * `cloud_proxy_country_code`: Country code for proxy location - supports: us, uk, fr, it, jp, au, de, fi, ca, in * `cloud_timeout`: Session timeout in minutes (free users: max 15 min, paid users: max 240 min) **Benefits:** * ✅ No local browser setup required * ✅ Scalable and fast cloud infrastructure * ✅ Automatic provisioning and teardown * ✅ Built-in authentication handling * ✅ Optimized for browser automation * ✅ Global proxy support for geo-restricted content ### Third-Party Cloud Browsers You can pass in a CDP URL from any remote browser ### Proxy Connection ```python theme={null} from browser_use import Agent, Browser, ChatBrowserUse from browser_use.browser import ProxySettings browser = Browser( headless=False, proxy=ProxySettings( server="http://proxy-server:8080", username="proxy-user", password="proxy-pass" ), cdp_url="http://remote-server:9222" ) agent = Agent( task="Your task here", llm=ChatBrowserUse(), browser=browser, ) ``` # All Parameters Source: https://docs.browser-use.com/customize/code-agent/all-parameters Complete reference for all CodeAgent configuration options ## CodeAgent Parameters ### Core Settings * `task`: Task description string that defines what the agent should accomplish (required) * `llm`: LLM instance for code generation (required: ChatBrowserUse). If not provided, defaults to ChatBrowserUse() * `browser`: Browser session object for automation (optional, will be created if not provided) * `tools`: Registry of tools the agent can call (optional, creates default if not provided) * `max_steps` (default: `100`): Maximum number of execution steps before termination * `max_failures` (default: `8`): Maximum consecutive errors before termination * `max_validations` (default: `0`): Maximum number of times to run the validator agent ### Vision & Processing * `use_vision` (default: `True`): Whether to include screenshots in LLM messages. `True` always includes screenshots, `False` never includes screenshots * `page_extraction_llm`: Separate LLM model for page content extraction. You can choose a small & fast model because it only needs to extract text from the page (default: same as `llm`) ### File & Data Management * `file_system`: File system instance for file operations (optional, creates default if not provided) * `available_file_paths`: List of file paths the agent can access * `sensitive_data`: Dictionary of sensitive data to handle carefully ### Advanced Options * `calculate_cost` (default: `False`): Calculate and track API costs ### Backwards Compatibility * `controller`: Alias for `tools` for backwards compatibility * `browser_session`: Alias for `browser` for backwards compatibility (deprecated, use `browser`) ## Return Value The `run()` method returns a `NotebookSession` object that contains: * `cells`: List of `CodeCell` objects representing each executed code cell * `id`: Unique session identifier * `current_execution_count`: Current execution count number * `namespace`: Dictionary containing the current namespace state with all variables ### CodeCell Properties Each cell in `session.cells` has: * `id`: Unique cell identifier * `cell_type`: Type of cell ('code' or 'markdown') * `source`: The code that was executed * `output`: The output from code execution (if any) * `execution_count`: Execution order number * `status`: Execution status ('pending', 'running', 'success', or 'error') * `error`: Error message if execution failed * `browser_state`: Browser state after execution ### Example ```python theme={null} session = await agent.run() # Access executed cells for cell in session.cells: print(f"Cell {cell.execution_count}: {cell.source}") if cell.error: print(f"Error: {cell.error}") elif cell.output: print(f"Output: {cell.output}") # Access variables from the namespace variables = session.namespace print(f"Variables: {list(variables.keys())}") ``` # Basics Source: https://docs.browser-use.com/customize/code-agent/basics Write Python code locally with browser automation CodeAgent writes and executes Python code locally with browser automation capabilities. It's designed for repetitive data extraction tasks where the agent can write reusable functions. CodeAgent executes Python code on your local machine like Claude Code. ## Quick Start ```python theme={null} import asyncio from browser_use import CodeAgent from dotenv import load_dotenv load_dotenv() async def main(): task = "Extract all products from example.com and save to products.csv" agent = CodeAgent(task=task) await agent.run() asyncio.run(main()) ``` ```bash .env theme={null} BROWSER_USE_API_KEY=your-api-key ``` CodeAgent currently only works with [ChatBrowserUse](/supported-models) which is optimized for this use case. Don't have one? We give you \$10 to try it out [here](https://cloud.browser-use.com/new-api-key). ## When to Use **Best for:** * Data extraction at scale (100s-1000s of items) * Repetitive interactions where functions can be reused * Tasks requiring data processing and file operations * Deterministic workflows you want to rerun **Performance:** * Best performance for data collection tasks * Slightly slower for one-off interactions vs standard Agent **Output:** * Generates Python code that can be rerun deterministically * Perfect for refining extraction logic The agent will write code blocks in different languages. This combines the power of js for browser interaction and python for data processing: ```js extract_products theme={null} (function(){ return Array.from(document.querySelectorAll('.product')).map(p => ({ name: p.querySelector('.name').textContent, price: p.querySelector('.price').textContent })) })() ``` ```python theme={null} import pandas as pd products = await evaluate(extract_products) # reuse other code blocks df = pd.DataFrame(products) df.to_csv('products.csv', index=False) ``` ## Available Libraries The agent can use common Python libraries: * **Data processing:** `pandas`, `numpy` * **Web:** `requests`, `BeautifulSoup` * **File formats:** `csv`, `json`, `openpyxl` (Excel) * **Visualization:** `matplotlib` * **Utilities:** `tabulate`, `datetime`, `re` * and all which you install ... ## Available Tools The agent has access to browser control functions: * `navigate(url)` - Navigate to a URL * `click(index)` - Click an element by index * `input(index, text)` - Type text into an input * `scroll(down, pages)` - Scroll the page * `upload_file(path)` - Upload a file * `evaluate(code, variables={})` - Execute JavaScript and return results * `done(text, success, files_to_display=[])` - Mark task complete ## Exporting Sessions CodeAgent automatically saves all executed code and JavaScript blocks during your session. You can export your complete automation workflow for sharing, version control, or re-running later. ### Quick Export ```python theme={null} from browser_use.code_use.notebook_export import export_to_ipynb, session_to_python_script # After running your agent await agent.run() # Export to Jupyter notebook notebook_path = export_to_ipynb(agent, "my_automation.ipynb") # Export to Python script script = session_to_python_script(agent) with open("my_automation.py", "w") as f: f.write(script) ``` ### Export Formats * **Jupyter Notebook (.ipynb)**: Interactive development, sharing, documentation * **Python Script (.py)**: Production deployment, version control, automation Both formats include: * Setup code with browser initialization * JavaScript code blocks as Python variables * All executed Python cells with outputs * Ready-to-run automation workflows # Example: Extract Products Source: https://docs.browser-use.com/customize/code-agent/example-products Collect thousands of products and save to CSV This example shows how to extract large amounts of product data from an e-commerce site and save it to files. ## Use Case Extract 1000s of products from multiple categories with: * Product URLs * Names and descriptions * Original and sale prices * Discount percentages Save everything to a CSV file for further analysis. ## Code ```python theme={null} import asyncio from browser_use.code_use import CodeAgent async def main(): task = """ Go to https://www.flipkart.com. Collect approximately 50 products from: 1. Books & Media - 15 products 2. Sports & Fitness - 15 products 3. Beauty & Personal Care - 10 products Save to products.csv """ agent = CodeAgent(task=task) await agent.run() asyncio.run(main()) ``` ## How It Works 1. **Agent navigates** to the e-commerce site 2. **Writes JavaScript** to extract product data from each page 3. **Loops through categories** collecting products 4. **Stores in variables** that persist across steps 5. **Saves to CSV** using pandas or csv module 6. **Returns deterministic code** you can modify and rerun ## Key Benefits * **Function reuse:** Extraction code is written once, used many times * **Scale:** Easily collect 100s or 1000s of items * **Deterministic:** The generated Python code can be saved and rerun * **Data processing:** Built-in pandas support for cleaning and transforming data [View full example on GitHub →](https://github.com/browser-use/browser-use/blob/main/examples/code_agent/extract_products.py) # Exporting Sessions Source: https://docs.browser-use.com/customize/code-agent/exporting Save and share your CodeAgent sessions as Jupyter notebooks or Python scripts CodeAgent automatically saves all executed code and JavaScript blocks during your session. You can export your complete automation workflow in multiple formats for sharing, version control, or re-running later. ## Quick Start ```python theme={null} import asyncio from browser_use import CodeAgent, ChatBrowserUse from browser_use.code_use.notebook_export import export_to_ipynb, session_to_python_script async def main(): agent = CodeAgent( task="Extract product data from https://example.com", llm=ChatBrowserUse(), max_steps=10 ) # Run your automation await agent.run() # Export to Jupyter notebook notebook_path = export_to_ipynb(agent, "product_scraping.ipynb") # Export to Python script python_script = session_to_python_script(agent) with open("product_scraping.py", "w") as f: f.write(python_script) if __name__ == '__main__': asyncio.run(main()) ``` ## Export Formats ### Jupyter Notebook (.ipynb) **Contains:** * Setup cell with browser initialization and imports * JavaScript code blocks as Python string variables * All executed Python cells with outputs and errors * Browser state snapshots **Structure:** ```python theme={null} # Cell 1: Setup import asyncio import json from browser_use import BrowserSession from browser_use.code_use import create_namespace browser = BrowserSession() await browser.start() namespace = create_namespace(browser) globals().update(namespace) # Cell 2: JavaScript variables extract_products = """(function(){ return Array.from(document.querySelectorAll('.product')).map(product => ({ name: product.querySelector('.name')?.textContent, price: product.querySelector('.price')?.textContent })); })()""" # Remaining cells: Python execution await navigate('https://example.com') ... products = await evaluate(extract_products) print(f"Found {len(products)} products") ``` ### Python Script (.py) **Best for:** Production deployment, version control, automation **Contains:** * Complete runnable script with all imports * JavaScript code blocks as Python string variables * All executed code with proper indentation * Ready to run with `python script.py` **Structure:** ```python theme={null} # Generated from browser-use code-use session import asyncio import json from browser_use import BrowserSession from browser_use.code_use import create_namespace async def main(): # Initialize browser and namespace browser = BrowserSession() await browser.start() # Create namespace with all browser control functions namespace = create_namespace(browser) # Extract functions from namespace for direct access navigate = namespace["navigate"] click = namespace["click"] evaluate = namespace["evaluate"] # ... other functions # JavaScript Code Block: extract_products extract_products = """(function(){ return Array.from(document.querySelectorAll('.product')).map(product => ({ name: product.querySelector('.name')?.textContent, price: product.querySelector('.price')?.textContent })); })()""" # Cell 1 await navigate('https://example.com') # Cell 2 products = await evaluate(extract_products) print(f"Found {len(products)} products") await browser.stop() if __name__ == '__main__': asyncio.run(main()) ``` # Output Format Source: https://docs.browser-use.com/customize/code-agent/output-format Understanding CodeAgent return values and how to access execution history ## NotebookSession The `run()` method returns a `NotebookSession` object containing all executed code cells and their results: ```python theme={null} session = await agent.run() # Access basic properties session.id # Unique session identifier session.cells # List of CodeCell objects session.current_execution_count # Total number of executed cells session.namespace # Dictionary with all variables from execution # Helper methods session.get_cell(cell_id) # Get a specific cell by ID session.get_latest_cell() # Get the most recently executed cell ``` ## CodeCell Properties Each cell in `session.cells` represents one executed code block: ```python theme={null} for cell in session.cells: cell.id # Unique cell identifier cell.cell_type # 'code' or 'markdown' cell.source # The code that was executed cell.output # Output from code execution (if any) cell.execution_count # Execution order number cell.status # 'pending', 'running', 'success', or 'error' cell.error # Error message if execution failed cell.browser_state # Browser state after execution ``` ## Accessing Results ### Basic Usage ```python theme={null} session = await agent.run() # Iterate through all executed cells for cell in session.cells: print(f"Cell {cell.execution_count}:") print(f" Code: {cell.source}") if cell.error: print(f" Error: {cell.error}") elif cell.output: print(f" Output: {cell.output}") print(f" Status: {cell.status}") # Get the last cell last_cell = session.get_latest_cell() if last_cell: print(f"Last output: {last_cell.output}") # Access variables from the execution namespace products = session.namespace.get('products', []) print(f"Extracted {len(products)} products") ``` ### Checking Task Completion When the agent calls `done()`, the result is stored in the namespace: ```python theme={null} session = await agent.run() # Check if task was completed task_done = session.namespace.get('_task_done', False) task_result = session.namespace.get('_task_result') task_success = session.namespace.get('_task_success') if task_done: print(f"Task completed: {task_success}") print(f"Result: {task_result}") ``` ### Getting All Outputs ```python theme={null} session = await agent.run() # Get all outputs (excluding errors) outputs = [cell.output for cell in session.cells if cell.output] # Get all errors errors = [cell.error for cell in session.cells if cell.error] # Get successful cells only successful_cells = [cell for cell in session.cells if cell.status == 'success'] ``` ## Data Models See the complete data model definitions in the [CodeAgent views source code](https://github.com/browser-use/browser-use/blob/main/browser_use/code_use/views.py). # Lifecycle Hooks Source: https://docs.browser-use.com/customize/hooks Customize agent behavior with lifecycle hooks Browser-Use provides lifecycle hooks that allow you to execute custom code at specific points during the agent's execution. Hook functions can be used to read and modify agent state while running, implement custom logic, change configuration, integrate the Agent with external applications. ## Available Hooks Currently, Browser-Use provides the following hooks: | Hook | Description | When it's called | | --------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------- | | `on_step_start` | Executed at the beginning of each agent step | Before the agent processes the current state and decides on the next action | | `on_step_end` | Executed at the end of each agent step | After the agent has executed all the actions for the current step, before it starts the next step | ```python theme={null} await agent.run(on_step_start=..., on_step_end=...) ``` Each hook should be an `async` callable function that accepts the `agent` instance as its only parameter. ### Basic Example ```python theme={null} import asyncio from pathlib import Path from browser_use import Agent, ChatOpenAI from browser_use.browser.events import ScreenshotEvent async def my_step_hook(agent: Agent): # inside a hook you can access all the state and methods under the Agent object: # agent.settings, agent.state, agent.task # agent.tools, agent.llm, agent.browser_session # agent.pause(), agent.resume(), agent.add_new_task(...), etc. # You also have direct access to the browser state state = await agent.browser_session.get_browser_state_summary() current_url = state.url visit_log = agent.history.urls() previous_url = visit_log[-2] if len(visit_log) >= 2 else None print(f'Agent was last on URL: {previous_url} and is now on {current_url}') cdp_session = await agent.browser_session.get_or_create_cdp_session() # Example: Get page HTML content doc = await cdp_session.cdp_client.send.DOM.getDocument(session_id=cdp_session.session_id) html_result = await cdp_session.cdp_client.send.DOM.getOuterHTML( params={'nodeId': doc['root']['nodeId']}, session_id=cdp_session.session_id ) page_html = html_result['outerHTML'] # Example: Take a screenshot using the event system screenshot_event = agent.browser_session.event_bus.dispatch(ScreenshotEvent(full_page=False)) await screenshot_event result = await screenshot_event.event_result(raise_if_any=True, raise_if_none=True) # Example: pause agent execution and resume it based on some custom code if '/finished' in current_url: agent.pause() Path('result.txt').write_text(page_html) input('Saved "finished" page content to result.txt, press [Enter] to resume...') agent.resume() async def main(): agent = Agent( task='Search for the latest news about AI', llm=ChatOpenAI(model='gpt-5-mini'), ) await agent.run( on_step_start=my_step_hook, # on_step_end=... max_steps=10, ) if __name__ == '__main__': asyncio.run(main()) ``` ## Data Available in Hooks When working with agent hooks, you have access to the entire `Agent` instance. Here are some useful data points you can access: * `agent.task` lets you see what the main task is, `agent.add_new_task(...)` lets you queue up a new one * `agent.tools` give access to the `Tools()` object and `Registry()` containing the available actions * `agent.tools.registry.execute_action('click', {'index': 123}, browser_session=agent.browser_session)` * `agent.sensitive_data` contains the sensitive data dict, which can be updated in-place to add/remove/modify items * `agent.settings` contains all the configuration options passed to the `Agent(...)` at init time * `agent.llm` gives direct access to the main LLM object (e.g. `ChatOpenAI`) * `agent.state` gives access to lots of internal state, including agent thoughts, outputs, actions, etc. * `agent.history` gives access to historical data from the agent's execution: * `agent.history.model_thoughts()`: Reasoning from Browser Use's model. * `agent.history.model_outputs()`: Raw outputs from the Browser Use's model. * `agent.history.model_actions()`: Actions taken by the agent * `agent.history.extracted_content()`: Content extracted from web pages * `agent.history.urls()`: URLs visited by the agent * `agent.browser_session` gives direct access to the `BrowserSession` and CDP interface * `agent.browser_session.agent_focus_target_id`: Get the current target ID the agent is focused on * `agent.browser_session.get_or_create_cdp_session()`: Get the current CDP session for browser interaction * `agent.browser_session.get_tabs()`: Get all tabs currently open * `agent.browser_session.get_current_page_url()`: Get the URL of the current active tab * `agent.browser_session.get_current_page_title()`: Get the title of the current active tab ## Tips for Using Hooks * **Avoid blocking operations**: Since hooks run in the same execution thread as the agent, keep them efficient and avoid blocking operations. * **Use custom tools instead**: hooks are fairly advanced, most things can be implemented with [custom tools](/customize/tools/basics) instead * **Increase step\_timeout**: If your hook is doing something that takes a long time, you can increase the `step_timeout` parameter in the `Agent(...)` constructor. *** # Documentation MCP Source: https://docs.browser-use.com/customize/integrations/docs-mcp Add browser-use documentation context to Claude Code and other MCP clients ## Overview The browser-use documentation MCP server provides read-only access to browser-use documentation for Claude Code and other MCP-compatible clients. This gives AI assistants deep context about the browser-use library when helping you write code. Looking to give an assistant browser-use capabilities? Check out our Browser Automation MCP. ## Quick Start Add the documentation server to your coding agent: ```bash theme={null} claude mcp add --transport http browser-use https://docs.browser-use.com/mcp ``` Add to `~/.cursor/mcp.json`: ```json theme={null} { "mcpServers": { "browser-use-docs": { "url": "https://docs.browser-use.com/mcp" } } } ``` Add to `~/.codex/config.toml`: ```toml theme={null} [mcp_servers.browser-use-docs] url = "https://docs.browser-use.com/mcp" ``` Add to `~/.codeium/windsurf/mcp_config.json`: ```json theme={null} { "mcpServers": { "browser-use-docs": { "serverUrl": "https://docs.browser-use.com/mcp" } } } ``` This enables your AI coding assistant to access browser-use documentation when answering questions or helping with implementation. ## What This Provides The documentation MCP server gives AI assistants access to: * API reference and usage patterns * Configuration options and parameters * Best practices and examples * Troubleshooting guides * Architecture explanations **Example interactions:** ``` "How do I configure custom tools in browser-use?" "What are the available agent parameters?" "Show me how to use cloud browsers." ``` Claude Code can now answer these questions using up-to-date documentation context. ## How It Works The MCP server provides a read-only documentation interface: * Serves browser-use documentation over HTTP * No browser automation capabilities (see [MCP Server](/customize/integrations/mcp-server) for that) * Lightweight and always available * No API keys or configuration needed ## Next Steps * Start coding with [Agent Basics](/customize/agent/basics) # MCP Server Source: https://docs.browser-use.com/customize/integrations/mcp-server Connect AI models to Browser Use through the Model Context Protocol Browser Use provides a hosted **Model Context Protocol (MCP)** server that enables AI assistants to control browser automation. Works with any HTTP-based MCP client, including Claude Code. **MCP Server URL:** `https://api.browser-use.com/mcp` This is an **HTTP-based MCP server** designed for cloud integrations and remote access. If you need a local stdio-based MCP server for Claude Desktop, use the free open-source version: `uvx browser-use --mcp` ## Quick Setup ### 1. Get API Key Get your API key from the [Browser Use Dashboard](https://cloud.browser-use.com) ### 2. Connect Your AI ```bash theme={null} claude mcp add --transport http browser-use https://api.browser-use.com/mcp ``` Add to your Claude Desktop config file: **macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json` **Windows:** `%APPDATA%\Claude\claude_desktop_config.json` ```json theme={null} { "mcpServers": { "browser-use": { "command": "npx", "args": [ "mcp-remote", "https://api.browser-use.com/mcp", "--header", "X-Browser-Use-API-Key: your-api-key" ] } } } ``` Restart Claude Desktop after saving. Add to `~/.cursor/mcp.json`: ```json theme={null} { "mcpServers": { "browser-use": { "command": "npx", "args": [ "mcp-remote", "https://api.browser-use.com/mcp", "--header", "X-Browser-Use-API-Key: your-api-key" ] } } } ``` Add to `~/.codeium/windsurf/mcp_config.json`: ```json theme={null} { "mcpServers": { "browser-use": { "serverUrl": "https://api.browser-use.com/mcp", "headers": { "X-Browser-Use-API-Key": "your-api-key" } } } } ``` **Step 1: Register an OAuth client** Call the dynamic client registration endpoint with ChatGPT's redirect URI: ```bash theme={null} curl -X POST https://api.browser-use.com/oauth/register \ -H "Content-Type: application/json" \ -d '{ "client_name": "ChatGPT Integration", "redirect_uris": ["https://chatgpt.com/connector_platform_oauth_redirect"] }' ``` Save the `client_id` from the response (43-character random string). **Step 2: Configure ChatGPT** In ChatGPT, add a custom MCP connector: * **MCP Server URL**: `https://api.browser-use.com/mcp/chatgpt` * **Client ID**: Paste the `client_id` from step 1 **Step 3: Authorize** ChatGPT will redirect you to Browser Use's authorization page. Sign in and grant permission. **Note:** ChatGPT uses OAuth 2.1 authentication instead of API keys. You only need to register your client once. ## Available Tools The MCP server provides three tools: ### `browser_task` Creates and runs a browser automation task. * **task** (required): What you want the browser to do * **max\_steps** (optional): Max actions to take (1-10, default: 8) * **profile\_id** (optional): UUID of the cloud profile to use for persistent authentication ### `list_browser_profiles` Lists all available cloud browser profiles for the authenticated project. Profiles store persistent authentication (cookies, sessions) for websites requiring login. ### `monitor_task` Checks the current status and progress of a browser automation task. Returns immediately with a snapshot of the task state. * **task\_id** (required): UUID of the task to monitor (returned by browser\_task) ## Example Usage Once connected, ask your AI to perform web tasks: > "Search Google for the latest iPhone reviews and summarize the top 3 results" > "Go to Hacker News and get me the titles of the top 5 posts" > "Fill out the contact form on example.com with my information" The AI will use the browser tools automatically to complete these tasks. ## Smart Features ### Cloud Profiles for Authentication Use cloud browser profiles to maintain persistent login sessions across tasks. Profiles store cookies and authentication state for: * Social media (X/Twitter, LinkedIn, Facebook) * Email (Gmail, Outlook) * Online banking and shopping sites * Any website requiring login List available profiles with `list_browser_profiles`, then pass the `profile_id` to `browser_task`. ### Real-time Task Monitoring Use `monitor_task` to check task progress while it's running. The tool returns immediately with the current status, latest step details, and agent reasoning. Call it repeatedly to track progress live. ### Conversational Progress Summaries When you monitor tasks, the AI automatically interprets step data into natural language updates, explaining what the browser has completed and what it's currently working on. ## Troubleshooting **Connection issues?** * Verify your API key is correct * Check you're using the right headers **Task taking too long?** * Check the live\_url to see progress * Increase max\_steps for complex tasks (max: 10) * Use clearer, more specific instructions **Need help?** Check our [Cloud Documentation](https://docs.cloud.browser-use.com) for detailed specifications. *** ## Local Self-Hosted Alternative For users who want a free, self-hosted option, browser-use can run as a local MCP server on your machine. This requires your own OpenAI or Anthropic API keys but provides direct, low-level control over browser automation. ### Quick Start The local MCP server runs as a stdio-based process on your machine. This is the **free, open-source option** but requires your own LLM API keys. #### Start MCP Server Manually ```bash theme={null} uvx --from 'browser-use[cli]' browser-use --mcp ``` The server will start in stdio mode, ready to accept MCP connections. #### Claude Desktop Integration The most common use case is integrating with Claude Desktop. Add this configuration to your Claude Desktop config file: **macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json` ```json theme={null} { "mcpServers": { "browser-use": { "command": "/Users/your-username/.local/bin/uvx", "args": ["--from", "browser-use[cli]", "browser-use", "--mcp"], "env": { "OPENAI_API_KEY": "your-openai-api-key-here" } } } } ``` **Windows:** `%APPDATA%\Claude\claude_desktop_config.json` ```json theme={null} { "mcpServers": { "browser-use": { "command": "uvx", "args": ["--from", "browser-use[cli]", "browser-use", "--mcp"], "env": { "OPENAI_API_KEY": "your-openai-api-key-here" } } } } ``` **macOS/Linux PATH Issue:** Claude Desktop may not find `uvx` in your PATH. Use the full path to `uvx` instead: * Run `which uvx` in your terminal to find the location (usually `/Users/username/.local/bin/uvx` or `~/.local/bin/uvx`) * Replace `"command": "uvx"` with the full path, e.g., `"command": "/Users/your-username/.local/bin/uvx"` * Replace `your-username` with your actual username **CLI Extras Required:** The `--from browser-use[cli]` flag installs the CLI extras needed for MCP server support. #### Environment Variables You can configure browser-use through environment variables: * `OPENAI_API_KEY` - Your OpenAI API key (required) * `ANTHROPIC_API_KEY` - Your Anthropic API key (alternative to OpenAI) * `BROWSER_USE_HEADLESS` - Set to `false` to show browser window * `BROWSER_USE_DISABLE_SECURITY` - Set to `true` to disable browser security features ### Available Tools The local MCP server exposes these low-level browser automation tools for direct control: #### Autonomous Agent Tools * **`retry_with_browser_use_agent`** - Run a complete browser automation task with an AI agent (use as last resort when direct control fails) #### Direct Browser Control * **`browser_navigate`** - Navigate to a URL * **`browser_click`** - Click on an element by index * **`browser_type`** - Type text into an element * **`browser_get_state`** - Get current page state and interactive elements * **`browser_scroll`** - Scroll the page * **`browser_go_back`** - Go back in browser history #### Tab Management * **`browser_list_tabs`** - List all open browser tabs * **`browser_switch_tab`** - Switch to a specific tab * **`browser_close_tab`** - Close a tab #### Content Extraction * **`browser_extract_content`** - Extract structured content from the current page #### Session Management * **`browser_list_sessions`** - List all active browser sessions with details * **`browser_close_session`** - Close a specific browser session by ID * **`browser_close_all`** - Close all active browser sessions ### Example Usage Once configured with Claude Desktop, you can ask Claude to perform browser automation tasks: ``` "Please navigate to example.com and take a screenshot" "Search for 'browser automation' on Google and summarize the first 3 results" "Go to GitHub, find the browser-use repository, and tell me about the latest release" ``` Claude will use the MCP server to execute these tasks through browser-use. ### Programmatic Usage You can also connect to the MCP server programmatically: ```python theme={null} import asyncio from mcp import ClientSession, StdioServerParameters from mcp.client.stdio import stdio_client async def use_browser_mcp(): # Connect to browser-use MCP server server_params = StdioServerParameters( command="uvx", args=["--from", "browser-use[cli]", "browser-use", "--mcp"] ) async with stdio_client(server_params) as (read, write): async with ClientSession(read, write) as session: await session.initialize() # Navigate to a website result = await session.call_tool( "browser_navigate", arguments={"url": "https://example.com"} ) print(result.content[0].text) # Get page state result = await session.call_tool( "browser_get_state", arguments={"include_screenshot": True} ) print("Page state retrieved!") asyncio.run(use_browser_mcp()) ``` ### Troubleshooting #### Common Issues **"CLI addon is not installed" Error** Make sure you're using `--from 'browser-use[cli]'` in your uvx command: ```bash theme={null} uvx --from 'browser-use[cli]' browser-use --mcp ``` **"spawn uvx ENOENT" Error (macOS/Linux)** Claude Desktop can't find `uvx` in its PATH. Use the full path in your config: * Run `which uvx` in terminal to find the location * Update your config to use the full path (e.g., `/Users/your-username/.local/bin/uvx`) **Browser doesn't start** * Check that you have Chrome/Chromium installed * Try setting `BROWSER_USE_HEADLESS=false` to see browser window * Ensure no other browser instances are using the same profile **API Key Issues** * Verify your `OPENAI_API_KEY` is set correctly * Check API key permissions and billing status * Try using `ANTHROPIC_API_KEY` as an alternative **Connection Issues in Claude Desktop** * Restart Claude Desktop after config changes * Check the config file syntax is valid JSON * Verify the file path is correct for your OS * Check logs at `~/Library/Logs/Claude/` (macOS) or `%APPDATA%\Claude\Logs\` (Windows) #### Debug Mode Enable debug logging by setting: ```bash theme={null} export BROWSER_USE_LOGGING_LEVEL=DEBUG uvx --from 'browser-use[cli]' browser-use --mcp ``` ### Security Considerations * The MCP server has access to your browser and file system * Only connect trusted MCP clients * Be cautious with sensitive websites and data * Consider running in a sandboxed environment for untrusted automation ### Next Steps * Explore the [examples directory](https://github.com/browser-use/browser-use/tree/main/examples/mcp) for more usage patterns * Check out [MCP documentation](https://modelcontextprotocol.io/) to learn more about the protocol * Join our [Discord](https://link.browser-use.com/discord) for support and discussions # All Parameters Source: https://docs.browser-use.com/customize/sandbox/all-parameters Sandbox configuration reference ## Reference | Parameter | Type | Description | Default | | -------------------------- | ---------- | -------------------------------------- | -------- | | `BROWSER_USE_API_KEY` | `str` | API key (or env var) | Required | | `cloud_profile_id` | `str` | Browser profile UUID | `None` | | `cloud_proxy_country_code` | `str` | us, uk, fr, it, jp, au, de, fi, ca, in | `None` | | `cloud_timeout` | `int` | Minutes (max: 15 free, 240 paid) | `None` | | `on_browser_created` | `Callable` | Live URL callback | `None` | | `on_log` | `Callable` | Log event callback | `None` | | `on_result` | `Callable` | Success callback | `None` | | `on_error` | `Callable` | Error callback | `None` | ## Example ```python theme={null} @sandbox( cloud_profile_id='550e8400-e29b-41d4-a716-446655440000', cloud_proxy_country_code='us', cloud_timeout=60, on_browser_created=lambda data: print(f'Live: {data.live_url}'), ) async def task(browser: Browser): agent = Agent(task="your task", browser=browser, llm=ChatBrowserUse()) await agent.run() ``` # Events Source: https://docs.browser-use.com/customize/sandbox/events Monitor execution with callbacks ## Live Browser View ```python theme={null} @sandbox(on_browser_created=lambda data: print(f'👁️ {data.live_url}')) async def task(browser: Browser): agent = Agent(task="your task", browser=browser, llm=ChatBrowserUse()) await agent.run() ``` ## All Events ```python theme={null} from browser_use.sandbox import BrowserCreatedData, LogData, ResultData, ErrorData @sandbox( on_browser_created=lambda data: print(f'Live: {data.live_url}'), on_log=lambda log: print(f'{log.level}: {log.message}'), on_result=lambda result: print('Done!'), on_error=lambda error: print(f'Error: {error.error}'), ) async def task(browser: Browser): # Your code ``` All callbacks can be sync or async. # Quickstart Source: https://docs.browser-use.com/customize/sandbox/quickstart Run browser automation in the cloud Sandboxes are the **easiest way to run Browser-Use in production**. We handle agents, browsers, persistence, auth, cookies, and LLMs. It's also the **fastest way to deploy** - the agent runs right next to the browser, so latency is minimal. Get your API key at [cloud.browser-use.com/new-api-key](https://cloud.browser-use.com/new-api-key) - new signups get \$10 free. ## Basic Example Just wrap your function with `@sandbox()`: ```python theme={null} from browser_use import Browser, sandbox, ChatBrowserUse from browser_use.agent.service import Agent @sandbox() async def my_task(browser: Browser): agent = Agent(task="Find the top HN post", browser=browser, llm=ChatBrowserUse()) await agent.run() await my_task() ``` ## With Cloud Parameters ```python theme={null} @sandbox( cloud_profile_id='your-profile-id', # Use saved cookies/auth cloud_proxy_country_code='us', # Bypass captchas, cloudflare, geo-restrictions cloud_timeout=60, # Max session time (minutes) ) async def task(browser: Browser, url: str): agent = Agent(task=f"Visit {url}", browser=browser, llm=ChatBrowserUse()) await agent.run() await task(url="https://example.com") ``` **What each does:** * `cloud_profile_id` - Use saved cookies/authentication from your cloud profile * `cloud_proxy_country_code` - Route through country-specific proxy for stealth (bypass captchas, Cloudflare, geo-blocks) * `cloud_timeout` - Maximum time browser stays open in minutes *** For more parameters and events, see the other tabs in this section. # Basics Source: https://docs.browser-use.com/customize/skills/basics Skills are your API for anything. Describe what you need in plain text, and get a production-ready API endpoint you can call repeatedly. To learn more visit [Skills - Concepts](https://docs.cloud.browser-use.com/concepts/skills). ## Quick Example Load `['*']` for all skills or specific skill IDs from [cloud.browser-use.com/skills](https://cloud.browser-use.com/skills). ```python theme={null} from browser_use import Agent, ChatBrowserUse agent = Agent( task='Your task', skills=['skill-uuid-1', 'skill-uuid-2'], # Specific skills (recommended) # or # skills=['*'], # All skills llm=ChatBrowserUse() ) await agent.run() ``` Be careful using `*`. Each skill will contribute around 200 tokens to the prompt. and don't forget to add your API key to `.env`: ```bash .env theme={null} BROWSER_USE_API_KEY=your-api-key ``` Get your API key on [cloud](https://cloud.browser-use.com/new-api-key) - new signups get \$10 free. ## Cookie Handling Cookies are automatically injected from your browser: ```python theme={null} agent = Agent( task='Post a tweet saying "Hello World"', skills=['tweet-poster-skill-id'], llm=ChatBrowserUse() ) # Agent navigates to twitter.com, logs in if needed, # extracts cookies, and passes them to the skill automatically await agent.run() ``` If cookies are missing, the LLM sees which cookies are needed and navigates to obtain them. *** ## Full Example ```python theme={null} from browser_use import Agent, ChatBrowserUse from dotenv import load_dotenv import asyncio load_dotenv() async def main(): agent = Agent( task='Analyze TikTak and Instegram profiles', skills=[ 'a582eb44-e4e2-4c55-acc2-2f5a875e35e9', # TikTak Profile Scraper 'f8d91c2a-3b4e-4f7d-9a1e-6c8e2d3f4a5b', # Instegram Profile Scraper ], llm=ChatBrowserUse() ) await agent.run() await agent.close() asyncio.run(main()) ``` Browse and create skills at [cloud.browser-use.com/skills](https://cloud.browser-use.com/skills). # Add Tools Source: https://docs.browser-use.com/customize/tools/add Examples: * deterministic clicks * file handling * calling APIs * human-in-the-loop * browser interactions * calling LLMs * get 2fa codes * send emails * Playwright integration (see [GitHub example](https://github.com/browser-use/browser-use/blob/main/examples/browser/playwright_integration.py)) * ... Simply add `@tools.action(...)` to your function. ```python theme={null} from browser_use import Tools, Agent tools = Tools() @tools.action(description='Ask human for help with a question') def ask_human(question: str) -> ActionResult: answer = input(f'{question} > ') return f'The human responded with: {answer}' ``` ```python theme={null} agent = Agent(task='...', llm=llm, tools=tools) ``` * **`description`** *(required)* - What the tool does, the LLM uses this to decide when to call it. * **`allowed_domains`** - List of domains where tool can run (e.g. `['*.example.com']`), defaults to all domains The Agent fills your function parameters based on their names, type hints, & defaults. ## Available Objects Your function has access to these objects: * **`browser_session: BrowserSession`** - Current browser session for CDP access * **`cdp_client`** - Direct Chrome DevTools Protocol client * **`page_extraction_llm: BaseChatModel`** - The LLM you pass into agent. This can be used to do a custom llm call here. * **`file_system: FileSystem`** - File system access * **`available_file_paths: list[str]`** - Available files for upload/processing * **`has_sensitive_data: bool`** - Whether action contains sensitive data ## Browser Interaction Examples You can use `browser_session` to directly interact with page elements using CSS selectors: ```python theme={null} from browser_use import Tools, Agent, ActionResult, BrowserSession tools = Tools() @tools.action(description='Click the submit button using CSS selector') async def click_submit_button(browser_session: BrowserSession): # Get the current page page = await browser_session.must_get_current_page() # Get element(s) by CSS selector elements = await page.get_elements_by_css_selector('button[type="submit"]') if not elements: return ActionResult(extracted_content='No submit button found') # Click the first matching element await elements[0].click() return ActionResult(extracted_content='Submit button clicked!') ``` Available methods on `Page`: * `get_elements_by_css_selector(selector: str)` - Returns list of matching elements * `get_element_by_prompt(prompt: str, llm)` - Returns element or None using LLM * `must_get_element_by_prompt(prompt: str, llm)` - Returns element or raises error Available methods on `Element`: * `click()` - Click the element * `type(text: str)` - Type text into the element * `get_text()` - Get element text content * See `browser_use/actor/element.py` for more methods ## Pydantic Input You can use Pydantic for the tool parameters: ```python theme={null} from pydantic import BaseModel class Cars(BaseModel): name: str = Field(description='The name of the car, e.g. "Toyota Camry"') price: int = Field(description='The price of the car as int in USD, e.g. 25000') @tools.action(description='Save cars to file') def save_cars(cars: list[Cars]) -> str: with open('cars.json', 'w') as f: json.dump(cars, f) return f'Saved {len(cars)} cars to file' task = "find cars and save them to file" ``` ## Domain Restrictions Limit tools to specific domains: ```python theme={null} @tools.action( description='Fill out banking forms', allowed_domains=['https://mybank.com'] ) def fill_bank_form(account_number: str) -> str: # Only works on mybank.com return f'Filled form for account {account_number}' ``` ## Advanced Example For a comprehensive example of custom tools with Playwright integration, see: **[Playwright Integration Example](https://github.com/browser-use/browser-use/blob/main/examples/browser/playwright_integration.py)** This shows how to create custom actions that use Playwright's precise browser automation alongside Browser-Use. # Available Tools Source: https://docs.browser-use.com/customize/tools/available Here is the [source code](https://github.com/browser-use/browser-use/blob/main/browser_use/tools/service.py) for the default tools: ### Navigation & Browser Control * **`search`** - Search queries (DuckDuckGo, Google, Bing) * **`navigate`** - Navigate to URLs * **`go_back`** - Go back in browser history * **`wait`** - Wait for specified seconds ### Page Interaction * **`click`** - Click elements by their index * **`input`** - Input text into form fields * **`upload_file`** - Upload files to file inputs * **`scroll`** - Scroll the page up/down * **`find_text`** - Scroll to specific text on page * **`send_keys`** - Send special keys (Enter, Escape, etc.) ### JavaScript Execution * **`evaluate`** - Execute custom JavaScript code on the page (for advanced interactions, shadow DOM, custom selectors, data extraction) ### Tab Management * **`switch`** - Switch between browser tabs * **`close`** - Close browser tabs ### Content Extraction * **`extract`** - Extract data from webpages using LLM ### Visual Analysis * **`screenshot`** - Request a screenshot in your next browser state for visual confirmation ### Form Controls * **`dropdown_options`** - Get dropdown option values * **`select_dropdown`** - Select dropdown options ### File Operations * **`write_file`** - Write content to files * **`read_file`** - Read file contents * **`replace_file`** - Replace text in files ### Task Completion * **`done`** - Complete the task (always available) # Basics Source: https://docs.browser-use.com/customize/tools/basics Tools are the functions that the agent has to interact with the world. ## Quick Example ```python theme={null} from browser_use import Tools, ActionResult, Browser tools = Tools() @tools.action('Ask human for help with a question') def ask_human(question: str, browser: Browser) -> ActionResult: answer = input(f'{question} > ') return f'The human responded with: {answer}' agent = Agent( task='Ask human for help', llm=llm, tools=tools, ) ``` Use `browser` parameter in tools for deterministic [Actor](/customize/actor/basics) actions. # Remove Tools Source: https://docs.browser-use.com/customize/tools/remove You can exclude default tools: ```python theme={null} from browser_use import Tools tools = Tools(exclude_actions=['search', 'wait']) agent = Agent(task='...', llm=llm, tools=tools) ``` # Tool Response Source: https://docs.browser-use.com/customize/tools/response Tools return results using `ActionResult` or simple strings. ## Return Types ```python theme={null} @tools.action('My tool') def my_tool() -> str: return "Task completed successfully" @tools.action('Advanced tool') def advanced_tool() -> ActionResult: return ActionResult( extracted_content="Main result", long_term_memory="Remember this info", error="Something went wrong", is_done=True, success=True, attachments=["file.pdf"], ) ``` ## ActionResult Properties * `extracted_content` (default: `None`) - Main result passed to LLM, this is equivalent to returning a string. * `include_extracted_content_only_once` (default: `False`) - Set to `True` for large content to include it only once in the LLM input. * `long_term_memory` (default: `None`) - This is always included in the LLM input for all future steps. * `error` (default: `None`) - Error message, we catch exceptions and set this automatically. This is always included in the LLM input. * `is_done` (default: `False`) - Tool completes entire task * `success` (default: `None`) - Task success (only valid with `is_done=True`) * `attachments` (default: `None`) - Files to show user * `metadata` (default: `None`) - Debug/observability data ## Why `extracted_content` and `long_term_memory`? With this you control the context for the LLM. ### 1. Include short content always in context ```python theme={null} def simple_tool() -> str: return "Hello, world!" # Keep in context for all future steps ``` ### 2. Show long content once, remember subset in context ```python theme={null} return ActionResult( extracted_content="[500 lines of product data...]", # Shows to LLM once include_extracted_content_only_once=True, # Never show full output again long_term_memory="Found 50 products" # Only this in future steps ) ``` We save the full `extracted_content` to files which the LLM can read in future steps. ### 3. Dont show long content, remember subset in context ```python theme={null} return ActionResult( extracted_content="[500 lines of product data...]", # The LLM never sees this because `long_term_memory` overrides it and `include_extracted_content_only_once` is not used long_term_memory="Saved user's favorite products", # This is shown to the LLM in future steps ) ``` ## Terminating the Agent Set `is_done=True` to stop the agent completely. Use when your tool finishes the entire task: ```python theme={null} @tools.action(description='Complete the task') def finish_task() -> ActionResult: return ActionResult( extracted_content="Task completed!", is_done=True, # Stops the agent success=True # Task succeeded ) ``` # Get Help Source: https://docs.browser-use.com/development/get-help More than 20k developers help each other 1. Check our [GitHub Issues](https://github.com/browser-use/browser-use/issues) 2. Ask in our [Discord community](https://link.browser-use.com/discord) 3. Get support for your enterprise with [support@browser-use.com](mailto:support@browser-use.com) # Costs Source: https://docs.browser-use.com/development/monitoring/costs Track token usage and API costs for your browser automation tasks ## Cost Tracking To track token usage and costs, enable cost calculation: ```python theme={null} from browser_use import Agent, ChatBrowserUse agent = Agent( task="Search for latest news about AI", llm=ChatBrowserUse(), calculate_cost=True # Enable cost tracking ) history = await agent.run() # Get usage from history print(f"Token usage: {history.usage}") # Or get from usage summary usage_summary = await agent.token_cost_service.get_usage_summary() print(f"Usage summary: {usage_summary}") ``` # Observability Source: https://docs.browser-use.com/development/monitoring/observability Trace Browser Use's agent execution steps and capture browser session recording ## Overview Browser Use has a native integration with [Laminar](https://laminar.sh) - open-source platform for monitoring and analyzing error patterns in AI agents. Laminar SDK automatically captures **agent execution steps, costs and browser session recordings** of Browser Use agent. Browser session recordings allows developers to see full video replay of the browser session, which is useful for debugging Browser Use agent. ## Setup Install Laminar python SDK. ```bash theme={null} pip install lmnr ``` Register on [Laminar Cloud](https://laminar.sh) or [self-host Laminar](https://github.com/lmnr-ai/lmnr), create a project and get the project API key from your project settings. Set the `LMNR_PROJECT_API_KEY` environment variable. ```bash theme={null} export LMNR_PROJECT_API_KEY= ``` ## Usage Then, you simply initialize the Laminar at the top of your project and both Browser Use agent traces and session recordings will be automatically captured. ```python {7-9} theme={null} from browser_use import Agent, ChatGoogle import asyncio from lmnr import Laminar import os # At initialization time, Laminar auto-instruments # Browser Use and any browser you use (local or remote) Laminar.initialize(project_api_key=os.getenv('LMNR_PROJECT_API_KEY')) async def main(): agent = Agent( task="go to ycombinator.com, summarize 3 startups from the latest batch", llm=ChatGoogle(model="gemini-2.5-flash"), ) await agent.run() asyncio.run(main()) ``` ## Viewing Traces You can view traces in the Laminar UI by going to the traces tab in your project. When you select a trace, you can see both the browser session recording and the agent execution steps. Timeline of the browser session is synced with the agent execution steps. In the trace view, you can also see the agent's current step, the tool it's using, and the tool's input and output. Laminar ## Laminar To learn more about how you can trace and evaluate your Browser Use agent with Laminar, check out [Laminar docs](https://docs.lmnr.ai). ## Browser Use Cloud Authentication Browser Use can sync your agent runs to the cloud for easy viewing and sharing. Authentication is required to protect your data. ### Quick Setup ```bash theme={null} # Authenticate once to enable cloud sync for all future runs browser-use auth # Or if using module directly: python -m browser_use.cli auth ``` **Note**: Cloud sync is enabled by default. If you've disabled it, you can re-enable with `export BROWSER_USE_CLOUD_SYNC=true`. ### Manual Authentication ```python theme={null} # Authenticate from code after task completion from browser_use import Agent agent = Agent(task="your task") await agent.run() # Later, authenticate for future runs await agent.authenticate_cloud_sync() ``` ### Reset Authentication ```bash theme={null} # Force re-authentication with a different account rm ~/.config/browseruse/cloud_auth.json browser-use auth ``` **Note**: Authentication uses OAuth Device Flow - you must complete the auth process while the command is running. Links expire when the polling stops. # OpenLIT Source: https://docs.browser-use.com/development/monitoring/openlit Complete observability for Browser Use with OpenLIT tracing ## Overview Browser Use has native integration with [OpenLIT](https://github.com/openlit/openlit) - an open-source opentelemetry-native platform that provides complete, granular traces for every task your browser-use agent performs—from high-level agent invocations down to individual browser actions. Read more about OpenLIT in the [OpenLIT docs](https://docs.openlit.io). ## Setup Install OpenLIT alongside Browser Use: ```bash theme={null} pip install openlit browser-use ``` ## Usage OpenLIT provides automatic, comprehensive instrumentation with **zero code changes** beyond initialization: ```python {5-6} theme={null} from browser_use import Agent, Browser, ChatOpenAI import asyncio import openlit # Initialize OpenLIT - that's it! openlit.init() async def main(): browser = Browser() llm = ChatOpenAI( model="gpt-4o", ) agent = Agent( task="Find the number trending post on Hacker news", llm=llm, browser=browser, ) history = await agent.run() return history if __name__ == "__main__": history = asyncio.run(main()) ``` ## Viewing Traces OpenLIT provides a powerful dashboard where you can: ### Monitor Execution Flows See the complete execution tree with timing information for every span. Click on any `invoke_model` span to see the exact prompt sent to the LLM and the complete response with agent reasoning. ### Track Costs and Token Usage * Cost breakdown by agent, task, and model * Token usage per LLM call with full input/output visibility * Compare costs across different LLM providers * Identify expensive prompts and optimize them ### Debug Failures with Agent Thoughts When an automation fails, you can: * See exactly which step failed * Read the agent's thinking at the failure point * Check the browser state and available elements * Analyze whether the failure was due to bad reasoning or bad information * Fix the root cause with full context ### Performance Optimization * Identify slow steps (LLM calls vs browser actions vs HTTP requests) * Compare execution times across runs * Optimize max\_steps and max\_actions\_per\_step * Track HTTP request latency for page navigations ## Configuration ### Custom OpenTelemetry Endpoint Configuration ```python theme={null} import openlit # Configure custom OTLP endpoints openlit.init( otlp_endpoint="http://localhost:4318", application_name="my-browser-automation", environment="production" ) ``` ### Environment Variables You can also configure OpenLIT via environment variables: ```bash theme={null} export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318" export OTEL_SERVICE_NAME="browser-automation" export OTEL_ENVIRONMENT="production" ``` ### Self-Hosted OpenLIT If you prefer to keep your data on-premises: ```bash theme={null} # Using Docker docker run -d \ -p 4318:4318 \ -p 3000:3000 \ openlit/openlit:latest # Access dashboard at http://localhost:3000 ``` ## Integration with Existing Tools OpenLIT uses OpenTelemetry under the hood, so it integrates seamlessly with: * **Jaeger** - Distributed tracing visualization * **Prometheus** - Metrics collection and alerting * **Grafana** - Custom dashboards and analytics * **Datadog** - APM and log management * **New Relic** - Full-stack observability * **Elastic APM** - Application performance monitoring Simply configure OpenLIT to export to your existing OTLP-compatible endpoint. # Telemetry Source: https://docs.browser-use.com/development/monitoring/telemetry Understanding Browser Use's telemetry ## Overview Browser Use is free under the MIT license. To help us continue improving the library, we collect anonymous usage data with [PostHog](https://posthog.com) . This information helps us understand how the library is used, fix bugs more quickly, and prioritize new features. ## Opting Out You can disable telemetry by setting the environment variable: ```bash .env theme={null} ANONYMIZED_TELEMETRY=false ``` Or in your Python code: ```python theme={null} import os os.environ["ANONYMIZED_TELEMETRY"] = "false" ``` Even when enabled, telemetry has zero impact on the library's performance. Code is available in [Telemetry Service](https://github.com/browser-use/browser-use/tree/main/browser_use/telemetry). # Contribution Guide Source: https://docs.browser-use.com/development/setup/contribution-guide ## Mission * Make developers happy * Do more clicks than human * Tell your computer what to do, and it gets it done. * Make agents faster and more reliable. ## What to work on? * This space is moving fast. We have 10 ideas daily. Let's exchange some. * Browse our [GitHub Issues](https://github.com/browser-use/browser-use/issues) * Check out our most active issues on [Discord](https://discord.gg/zXJJHtJf3k) * Get inspiration in [`#showcase-your-work`](https://discord.com/channels/1303749220842340412/1305549200678850642) channel ## What makes a great PR? 1. Why do we need this PR? 2. Include a demo screenshot/gif 3. Make sure the PR passes all CI tests 4. Keep your PR focused on a single feature ## How? 1. Fork the repository 2. Create a new branch for your feature 3. Submit a PR We are overwhelmed with Issues. Feel free to bump your issues/PRs with comments periodically if you need faster feedback. # Local Setup Source: https://docs.browser-use.com/development/setup/local-setup We're excited to have you join our community of contributors. ## Welcome to Browser Use Development! ```bash theme={null} git clone https://github.com/browser-use/browser-use cd browser-use uv sync --all-extras --dev # or pip install -U git+https://github.com/browser-use/browser-use.git@main ``` ## Configuration Set up your environment variables: ```bash theme={null} # Copy the example environment file cp .env.example .env # set logging level # BROWSER_USE_LOGGING_LEVEL=debug ``` ## Helper Scripts For common development tasks ```bash theme={null} # Complete setup script - installs uv, creates a venv, and installs dependencies ./bin/setup.sh # Run all pre-commit hooks (formatting, linting, type checking) ./bin/lint.sh # Run the core test suite that's executed in CI ./bin/test.sh ``` ## Run examples ```bash theme={null} uv run examples/simple.py ``` # Ad-Use (Ad Generator) Source: https://docs.browser-use.com/examples/apps/ad-use Generate Instagram image ads and TikTok video ads from landing pages using browser agents, Google's Nano Banana 🍌, and Veo3. This demo requires browser-use v0.7.6+.