Browser-Use provides lifecycle hooks that allow you to execute custom code at specific points during the agent’s execution.
Hook functions can be used to read and modify agent state while running, implement custom logic, change configuration, integrate the Agent with external applications.
Available Hooks
Currently, Browser-Use provides the following hooks:
Hook | Description | When it’s called |
---|
on_step_start | Executed at the beginning of each agent step | Before the agent processes the current state and decides on the next action |
on_step_end | Executed at the end of each agent step | After the agent has executed all the actions for the current step, before it starts the next step |
await agent.run(on_step_start=..., on_step_end=...)
Each hook should be an async
callable function that accepts the agent
instance as its only parameter.
Basic Example
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def my_step_hook(agent: Agent):
# inside a hook you can access all the state and methods under the Agent object:
# agent.settings, agent.state, agent.task
# agent.controller, agent.llm, agent.browser_session
# agent.pause(), agent.resume(), agent.add_new_task(...), etc.
# You also have direct access to the playwright Page and Browser Context
page = await agent.browser_session.get_current_page()
# https://playwright.dev/python/docs/api/class-page
current_url = page.url
visit_log = agent.state.history.urls()
previous_url = visit_log[-2] if len(visit_log) >= 2 else None
print(f"Agent was last on URL: {previous_url} and is now on {current_url}")
# Example: listen for events on the page, interact with the DOM, run JS directly, etc.
await page.on('domcontentloaded', lambda: print('page navigated to a new url...'))
await page.locator("css=form > input[type=submit]").click()
await page.evaluate('() => alert(1)')
await page.browser.new_tab
await agent.browser_session.session.context.add_init_script('/* some JS to run on every page */')
# Example: monitor or intercept all network requests
async def handle_request(route):
# Print, modify, block, etc. do anything to the requests here
# https://playwright.dev/python/docs/network#handle-requests
print(route.request, route.request.headers)
await route.continue_(headers=route.request.headers)
await page.route("**/*", handle_route)
# Example: pause agent execution and resume it based on some custom code
if '/completed' in current_url:
agent.pause()
Path('result.txt').write_text(await page.content())
input('Saved "completed" page content to result.txt, press [Enter] to resume...')
agent.resume()
agent = Agent(
task="Search for the latest news about AI",
llm=ChatOpenAI(model="gpt-4o"),
)
await agent.run(
on_step_start=my_step_hook,
# on_step_end=...
max_steps=10
)
Data Available in Hooks
When working with agent hooks, you have access to the entire Agent
instance. Here are some useful data points you can access:
agent.task
lets you see what the main task is, agent.add_new_task(...)
lets you queue up a new one
agent.controller
give access to the Controller()
object and Registry()
containing the available actions
agent.controller.registry.execute_action('click_element_by_index', {'index': 123}, browser_session=agent.browser_session)
agent.context
lets you access any user-provided context object passed in to Agent(context=...)
agent.sensitive_data
contains the sensitive data dict, which can be updated in-place to add/remove/modify items
agent.settings
contains all the configuration options passed to the Agent(...)
at init time
agent.llm
gives direct access to the main LLM object (e.g. ChatOpenAI
)
agent.state
gives access to lots of internal state, including agent thoughts, outputs, actions, etc.
agent.state.history.model_thoughts()
: Reasoning from Browser Use’s model.
agent.state.history.model_outputs()
: Raw outputs from the Browsre Use’s model.
agent.state.history.model_actions()
: Actions taken by the agent
agent.state.history.extracted_content()
: Content extracted from web pages
agent.state.history.urls()
: URLs visited by the agent
agent.browser_session
gives direct access to the BrowserSession()
and playwright objects
agent.browser_session.get_current_page()
: Get the current playwright Page
object the agent is focused on
agent.browser_session.browser_context
: Get the current playwright BrowserContext
object
agent.browser_session.browser_context.pages
: Get all the tabs currently open in the context
agent.browser_session.get_page_html()
: Current page HTML
agent.browser_session.take_screenshot()
: Screenshot of the current page
Tips for Using Hooks
- Avoid blocking operations: Since hooks run in the same execution thread as the agent, try to keep them efficient or use asynchronous patterns.
- Handle exceptions: Make sure your hook functions handle exceptions gracefully to prevent interrupting the agent’s main flow.
- Use custom actions instead: hooks are fairly advanced, most things can be implemented with custom action functions instead
Complex Example: Agent Activity Recording System
This comprehensive example demonstrates a complete implementation for recording and saving Browser-Use agent activity, consisting of both server and client components.
Setup Instructions
To use this example, you’ll need to:
-
Set up the required dependencies:
pip install fastapi uvicorn prettyprinter pyobjtojson dotenv browser-use langchain-openai
-
Create two separate Python files:
api.py
- The FastAPI server component
client.py
- The Browser-Use agent with recording hook
-
Run both components:
- Start the API server first:
python api.py
- Then run the client:
python client.py
Server Component (api.py)
The server component handles receiving and storing the agent’s activity data:
#!/usr/bin/env python3
#
# FastAPI API to record and save Browser-Use activity data.
# Save this code to api.py and run with `python api.py`
#
import json
import base64
from pathlib import Path
from fastapi import FastAPI, Request
import prettyprinter
import uvicorn
prettyprinter.install_extras()
# Utility function to save screenshots
def b64_to_png(b64_string: str, output_file):
"""
Convert a Base64-encoded string to a PNG file.
:param b64_string: A string containing Base64-encoded data
:param output_file: The path to the output PNG file
"""
with open(output_file, "wb") as f:
f.write(base64.b64decode(b64_string))
# Initialize FastAPI app
app = FastAPI()
@app.post("/post_agent_history_step")
async def post_agent_history_step(request: Request):
data = await request.json()
prettyprinter.cpprint(data)
# Ensure the "recordings" folder exists using pathlib
recordings_folder = Path("recordings")
recordings_folder.mkdir(exist_ok=True)
# Determine the next file number by examining existing .json files
existing_numbers = []
for item in recordings_folder.iterdir():
if item.is_file() and item.suffix == ".json":
try:
file_num = int(item.stem)
existing_numbers.append(file_num)
except ValueError:
# In case the file name isn't just a number
pass
if existing_numbers:
next_number = max(existing_numbers) + 1
else:
next_number = 1
# Construct the file path
file_path = recordings_folder / f"{next_number}.json"
# Save the JSON data to the file
with file_path.open("w") as f:
json.dump(data, f, indent=2)
# Optionally save screenshot if needed
# if "website_screenshot" in data and data["website_screenshot"]:
# screenshot_folder = Path("screenshots")
# screenshot_folder.mkdir(exist_ok=True)
# b64_to_png(data["website_screenshot"], screenshot_folder / f"{next_number}.png")
return {"status": "ok", "message": f"Saved to {file_path}"}
if __name__ == "__main__":
print("Starting Browser-Use recording API on http://0.0.0.0:9000")
uvicorn.run(app, host="0.0.0.0", port=9000)
Client Component (client.py)
The client component runs the Browser-Use agent with a recording hook:
#!/usr/bin/env python3
#
# Client to record and save Browser-Use activity.
# Save this code to client.py and run with `python client.py`
#
import asyncio
import requests
from dotenv import load_dotenv
from pyobjtojson import obj_to_json
from langchain_openai import ChatOpenAI
from browser_use import Agent
# Load environment variables (for API keys)
load_dotenv()
def send_agent_history_step(data):
"""Send the agent step data to the recording API"""
url = "http://127.0.0.1:9000/post_agent_history_step"
response = requests.post(url, json=data)
return response.json()
async def record_activity(agent_obj):
"""Hook function that captures and records agent activity at each step"""
website_html = None
website_screenshot = None
urls_json_last_elem = None
model_thoughts_last_elem = None
model_outputs_json_last_elem = None
model_actions_json_last_elem = None
extracted_content_json_last_elem = None
print('--- ON_STEP_START HOOK ---')
# Capture current page state
website_html = await agent_obj.browser_session.get_page_html()
website_screenshot = await agent_obj.browser_session.take_screenshot()
# Make sure we have state history
if hasattr(agent_obj, "state"):
history = agent_obj.state.history
else:
history = None
print("Warning: Agent has no state history")
return
# Process model thoughts
model_thoughts = obj_to_json(
obj=history.model_thoughts(),
check_circular=False
)
if len(model_thoughts) > 0:
model_thoughts_last_elem = model_thoughts[-1]
# Process model outputs
model_outputs = agent_obj.state.history.model_outputs()
model_outputs_json = obj_to_json(
obj=model_outputs,
check_circular=False
)
if len(model_outputs_json) > 0:
model_outputs_json_last_elem = model_outputs_json[-1]
# Process model actions
model_actions = agent_obj.state.history.model_actions()
model_actions_json = obj_to_json(
obj=model_actions,
check_circular=False
)
if len(model_actions_json) > 0:
model_actions_json_last_elem = model_actions_json[-1]
# Process extracted content
extracted_content = agent_obj.state.history.extracted_content()
extracted_content_json = obj_to_json(
obj=extracted_content,
check_circular=False
)
if len(extracted_content_json) > 0:
extracted_content_json_last_elem = extracted_content_json[-1]
# Process URLs
urls = agent_obj.state.history.urls()
urls_json = obj_to_json(
obj=urls,
check_circular=False
)
if len(urls_json) > 0:
urls_json_last_elem = urls_json[-1]
# Create a summary of all data for this step
model_step_summary = {
"website_html": website_html,
"website_screenshot": website_screenshot,
"url": urls_json_last_elem,
"model_thoughts": model_thoughts_last_elem,
"model_outputs": model_outputs_json_last_elem,
"model_actions": model_actions_json_last_elem,
"extracted_content": extracted_content_json_last_elem
}
print("--- MODEL STEP SUMMARY ---")
print(f"URL: {urls_json_last_elem}")
# Send data to the API
result = send_agent_history_step(data=model_step_summary)
print(f"Recording API response: {result}")
async def run_agent():
"""Run the Browser-Use agent with the recording hook"""
agent = Agent(
task="Compare the price of gpt-4o and DeepSeek-V3",
llm=ChatOpenAI(model="gpt-4o"),
)
try:
print("Starting Browser-Use agent with recording hook")
await agent.run(
on_step_start=record_activity,
max_steps=30
)
except Exception as e:
print(f"Error running agent: {e}")
if __name__ == "__main__":
# Check if API is running
try:
requests.get("http://127.0.0.1:9000")
print("Recording API is available")
except:
print("Warning: Recording API may not be running. Start api.py first.")
# Run the agent
asyncio.run(run_agent())
Contribution by Carlos A. Planchón.
Working with the Recorded Data
After running the agent, you’ll find the recorded data in the recordings
directory. Here’s how you can use this data:
- View recorded sessions: Each JSON file contains a snapshot of agent activity for one step
- Extract screenshots: You can modify the API to save screenshots separately
- Analyze agent behavior: Use the recorded data to study how the agent navigates websites
Extending the Example
You can extend this recording system in several ways:
- Save screenshots separately: Uncomment the screenshot saving code in the API
- Add a web dashboard: Create a simple web interface to view recorded sessions
- Add session IDs: Modify the API to group steps by agent session
- Add filtering: Implement filters to record only specific types of actions