Agent Settings
Learn how to configure the agent
Overview
The Agent
class is the core component of Browser Use that handles browser automation. Here are the main configuration options you can use when initializing an agent.
Basic Settings
Required Parameters
task
: The instruction for the agent to executellm
: A LangChain chat model instance. See LangChain Models for supported models.
Agent Behavior
Control how the agent operates:
Behavior Parameters
controller
: Registry of functions the agent can call. Defaults to base Controller. See Custom Functions for details.use_vision
: Enable/disable vision capabilities. Defaults toTrue
.- When enabled, the model processes visual information from web pages
- Disable to reduce costs or use models without vision support
- For GPT-4o, image processing costs approximately 800-1000 tokens (~$0.002 USD) per image (but this depends on the defined screen size)
save_conversation_path
: Path to save the complete conversation history. Useful for debugging.override_system_message
: Completely replace the default system prompt with a custom one.extend_system_message
: Add additional instructions to the default system prompt.
Vision capabilities are recommended for better web interaction understanding, but can be disabled to reduce costs or when using models without vision support.
(Reuse) Browser Configuration
You can configure how the agent interacts with the browser. To see more Browser
options refer to the Browser Settings documentation.
Reuse Existing Browser
browser
: A Browser Use Browser instance. When provided, the agent will reuse this browser instance and automatically create new contexts for each run()
.
Remember: in this scenario the Browser
will not be closed automatically.
Reuse Existing Browser Context
browser_context
: A Playwright browser context. Useful for maintaining persistent sessions. See Persistent Browser for more details.
For more information about how browser context works, refer to the Playwright documentation.
You can reuse the same context for multiple agents. If you do nothing, the
browser will be automatically created and closed on run()
completion.
Running the Agent
The agent is executed using the async run()
method:
max_steps
(default:100
) Maximum number of steps the agent can take during execution. This prevents infinite loops and helps control execution time.
Agent History
The method returns an AgentHistoryList
object containing the complete execution history. This history is invaluable for debugging, analysis, and creating reproducible scripts.
The AgentHistoryList
provides many helper methods to analyze the execution:
final_result()
: Get the final extracted contentis_done()
: Check if the agent completed successfullyhas_errors()
: Check if any errors occurredmodel_thoughts()
: Get the agent’s reasoning processaction_results()
: Get results of all actions
For a complete list of helper methods and detailed history analysis capabilities, refer to the AgentHistoryList source code.
Run initial actions without LLM
With this example you can run initial actions without the LLM. Specify the action as a dictionary where the key is the action name and the value is the action parameters. You can find all our actions in the Controller source code.
Run with message context
You can configure the agent and provide a separate message to help the LLM understand the task better.
Run with planner model
You can configure the agent to use a separate planner model for high-level task planning:
Planner Parameters
planner_llm
: A LangChain chat model instance used for high-level task planning. Can be a smaller/cheaper model than the main LLM.use_vision_for_planner
: Enable/disable vision capabilities for the planner model. Defaults toTrue
.planner_interval
: Number of steps between planning phases. Defaults to1
.
Using a separate planner model can help:
- Reduce costs by using a smaller model for high-level planning
- Improve task decomposition and strategic thinking
- Better handle complex, multi-step tasks
The planner model is optional. If not specified, the agent will not use the planner model.
Optional Parameters
message_context
: Additional information about the task to help the LLM understand the task better.initial_actions
: List of initial actions to run before the main task.max_actions_per_step
: Maximum number of actions to run in a step. Defaults to10
.max_failures
: Maximum number of failures before giving up. Defaults to3
.retry_delay
: Time to wait between retries in seconds when rate limited. Defaults to10
.generate_gif
: Enable/disable GIF generation. Defaults toFalse
. Set toTrue
or a string path to save the GIF.
Memory Management
Browser Use includes a procedural memory system using Mem0 that automatically summarizes the agent’s conversation history at regular intervals to optimize context window usage during long tasks.
Memory Parameters
enable_memory
: Enable/disable the procedural memory system. Defaults toTrue
.memory_interval
: Number of steps between memory summarization. Defaults to10
.memory_config
: Optional configuration dictionary for the underlying memory system.
How Memory Works
When enabled, the agent periodically compresses its conversation history into concise summaries:
- Every
memory_interval
steps, the agent reviews its recent interactions - It creates a procedural memory summary using the same LLM as the agent
- The original messages are replaced with the summary, reducing token usage
- This process helps maintain important context while freeing up the context window
Disabling Memory
If you want to disable the memory system (for debugging or for shorter tasks), set enable_memory
to False
:
Disabling memory may be useful for debugging or short tasks, but for longer tasks, it can lead to context window overflow as the conversation history grows. The memory system helps maintain performance during extended sessions.
Was this page helpful?