Using Agent Lifecycle Hooks

Browser-Use provides lifecycle hooks that allow you to execute custom code at specific points during the agent’s execution. These hooks enable you to capture detailed information about the agent’s actions, modify behavior, or integrate with external systems.

Available Hooks

Currently, Browser-Use provides the following hooks:

HookDescriptionWhen it’s called
on_step_startExecuted at the beginning of each agent stepBefore the agent processes the current state and decides on the next action
on_step_endExecuted at the end of each agent stepAfter the agent has executed the action for the current step

Using Hooks

Hooks are passed as parameters to the agent.run() method. Each hook should be a callable function that accepts the agent instance as its parameter.

Basic Example

from browser_use import Agent
from langchain_openai import ChatOpenAI

async def my_step_hook(agent):
    # inside a hook you can access all the state and methods under the Agent object:
    #   agent.settings, agent.state, agent.task
    #   agent.controller, agent.llm, agent.browser, agent.browser_context
    #   agent.pause(), agent.resume(), agent.add_new_task(...), etc.
    
    current_page = await agent.browser_context.get_current_page()
    
    visit_log = agent.state.history.urls()
    current_url = current_page.url
    previous_url = visit_log[-2] if len(visit_log) >= 2 else None
    print(f"Agent was last on URL: {previous_url} and is now on {current_url}")
    
    if 'completed' in current_url:
        agent.pause()
        Path('result.txt').write_text(await current_page.content()) 
        input('Saved "completed" page content to result.txt, press [Enter] to resume...')
        agent.resume()
    
agent = Agent(
    task="Search for the latest news about AI",
    llm=ChatOpenAI(model="gpt-4o"),
)

await agent.run(
    on_step_start=my_step_hook,
    # on_step_end=...
    max_steps=10
)

Complete Example: Agent Activity Recording System

This comprehensive example demonstrates a complete implementation for recording and saving Browser-Use agent activity, consisting of both server and client components.

Setup Instructions

To use this example, you’ll need to:

  1. Set up the required dependencies:

    pip install fastapi uvicorn prettyprinter pyobjtojson dotenv browser-use langchain-openai
    
  2. Create two separate Python files:

    • api.py - The FastAPI server component
    • client.py - The Browser-Use agent with recording hook
  3. Run both components:

    • Start the API server first: python api.py
    • Then run the client: python client.py

Server Component (api.py)

The server component handles receiving and storing the agent’s activity data:

#!/usr/bin/env python3

#
# FastAPI API to record and save Browser-Use activity data.
# Save this code to api.py and run with `python api.py`
# 

import json
import base64
from pathlib import Path

from fastapi import FastAPI, Request
import prettyprinter
import uvicorn

prettyprinter.install_extras()

# Utility function to save screenshots
def b64_to_png(b64_string: str, output_file):
    """
    Convert a Base64-encoded string to a PNG file.
    
    :param b64_string: A string containing Base64-encoded data
    :param output_file: The path to the output PNG file
    """
    with open(output_file, "wb") as f:
        f.write(base64.b64decode(b64_string))

# Initialize FastAPI app
app = FastAPI()


@app.post("/post_agent_history_step")
async def post_agent_history_step(request: Request):
    data = await request.json()
    prettyprinter.cpprint(data)

    # Ensure the "recordings" folder exists using pathlib
    recordings_folder = Path("recordings")
    recordings_folder.mkdir(exist_ok=True)

    # Determine the next file number by examining existing .json files
    existing_numbers = []
    for item in recordings_folder.iterdir():
        if item.is_file() and item.suffix == ".json":
            try:
                file_num = int(item.stem)
                existing_numbers.append(file_num)
            except ValueError:
                # In case the file name isn't just a number
                pass

    if existing_numbers:
        next_number = max(existing_numbers) + 1
    else:
        next_number = 1

    # Construct the file path
    file_path = recordings_folder / f"{next_number}.json"

    # Save the JSON data to the file
    with file_path.open("w") as f:
        json.dump(data, f, indent=2)

    # Optionally save screenshot if needed
    # if "website_screenshot" in data and data["website_screenshot"]:
    #     screenshot_folder = Path("screenshots")
    #     screenshot_folder.mkdir(exist_ok=True)
    #     b64_to_png(data["website_screenshot"], screenshot_folder / f"{next_number}.png")

    return {"status": "ok", "message": f"Saved to {file_path}"}

if __name__ == "__main__":
    print("Starting Browser-Use recording API on http://0.0.0.0:9000")
    uvicorn.run(app, host="0.0.0.0", port=9000)

Client Component (client.py)

The client component runs the Browser-Use agent with a recording hook:

#!/usr/bin/env python3

#
# Client to record and save Browser-Use activity.
# Save this code to client.py and run with `python client.py`
#

import asyncio
import requests
from dotenv import load_dotenv
from pyobjtojson import obj_to_json
from langchain_openai import ChatOpenAI
from browser_use import Agent

# Load environment variables (for API keys)
load_dotenv()


def send_agent_history_step(data):
    """Send the agent step data to the recording API"""
    url = "http://127.0.0.1:9000/post_agent_history_step"
    response = requests.post(url, json=data)
    return response.json()


async def record_activity(agent_obj):
    """Hook function that captures and records agent activity at each step"""
    website_html = None
    website_screenshot = None
    urls_json_last_elem = None
    model_thoughts_last_elem = None
    model_outputs_json_last_elem = None
    model_actions_json_last_elem = None
    extracted_content_json_last_elem = None

    print('--- ON_STEP_START HOOK ---')
    
    # Capture current page state
    website_html = await agent_obj.browser_context.get_page_html()
    website_screenshot = await agent_obj.browser_context.take_screenshot()

    # Make sure we have state history
    if hasattr(agent_obj, "state"):
        history = agent_obj.state.history
    else:
        history = None
        print("Warning: Agent has no state history")
        return

    # Process model thoughts
    model_thoughts = obj_to_json(
        obj=history.model_thoughts(),
        check_circular=False
    )
    if len(model_thoughts) > 0:
        model_thoughts_last_elem = model_thoughts[-1]

    # Process model outputs
    model_outputs = agent_obj.state.history.model_outputs()
    model_outputs_json = obj_to_json(
        obj=model_outputs,
        check_circular=False
    )
    if len(model_outputs_json) > 0:
        model_outputs_json_last_elem = model_outputs_json[-1]

    # Process model actions
    model_actions = agent_obj.state.history.model_actions()
    model_actions_json = obj_to_json(
        obj=model_actions,
        check_circular=False
    )
    if len(model_actions_json) > 0:
        model_actions_json_last_elem = model_actions_json[-1]

    # Process extracted content
    extracted_content = agent_obj.state.history.extracted_content()
    extracted_content_json = obj_to_json(
        obj=extracted_content,
        check_circular=False
    )
    if len(extracted_content_json) > 0:
        extracted_content_json_last_elem = extracted_content_json[-1]

    # Process URLs
    urls = agent_obj.state.history.urls()
    urls_json = obj_to_json(
        obj=urls,
        check_circular=False
    )
    if len(urls_json) > 0:
        urls_json_last_elem = urls_json[-1]

    # Create a summary of all data for this step
    model_step_summary = {
        "website_html": website_html,
        "website_screenshot": website_screenshot,
        "url": urls_json_last_elem,
        "model_thoughts": model_thoughts_last_elem,
        "model_outputs": model_outputs_json_last_elem,
        "model_actions": model_actions_json_last_elem,
        "extracted_content": extracted_content_json_last_elem
    }

    print("--- MODEL STEP SUMMARY ---")
    print(f"URL: {urls_json_last_elem}")
    
    # Send data to the API
    result = send_agent_history_step(data=model_step_summary)
    print(f"Recording API response: {result}")


async def run_agent():
    """Run the Browser-Use agent with the recording hook"""
    agent = Agent(
        task="Compare the price of gpt-4o and DeepSeek-V3",
        llm=ChatOpenAI(model="gpt-4o"),
    )
    
    try:
        print("Starting Browser-Use agent with recording hook")
        await agent.run(
            on_step_start=record_activity,
            max_steps=30
        )
    except Exception as e:
        print(f"Error running agent: {e}")


if __name__ == "__main__":
    # Check if API is running
    try:
        requests.get("http://127.0.0.1:9000")
        print("Recording API is available")
    except:
        print("Warning: Recording API may not be running. Start api.py first.")
    
    # Run the agent
    asyncio.run(run_agent())

Working with the Recorded Data

After running the agent, you’ll find the recorded data in the recordings directory. Here’s how you can use this data:

  1. View recorded sessions: Each JSON file contains a snapshot of agent activity for one step
  2. Extract screenshots: You can modify the API to save screenshots separately
  3. Analyze agent behavior: Use the recorded data to study how the agent navigates websites

Extending the Example

You can extend this recording system in several ways:

  1. Save screenshots separately: Uncomment the screenshot saving code in the API
  2. Add a web dashboard: Create a simple web interface to view recorded sessions
  3. Add session IDs: Modify the API to group steps by agent session
  4. Add filtering: Implement filters to record only specific types of actions

Data Available in Hooks

When working with agent hooks, you have access to the entire agent instance. Here are some useful data points you can access:

  • agent.state.history.model_thoughts(): Reasoning from Browser Use’s model.
  • agent.state.history.model_outputs(): Raw outputs from the Browsre Use’s model.
  • agent.state.history.model_actions(): Actions taken by the agent
  • agent.state.history.extracted_content(): Content extracted from web pages
  • agent.state.history.urls(): URLs visited by the agent
  • agent.browser_context.get_page_html(): Current page HTML
  • agent.browser_context.take_screenshot(): Screenshot of the current page

Tips for Using Hooks

  • Avoid blocking operations: Since hooks run in the same execution thread as the agent, try to keep them efficient or use asynchronous patterns.
  • Handle exceptions: Make sure your hook functions handle exceptions gracefully to prevent interrupting the agent’s main flow.
  • Consider storage needs: When capturing full HTML and screenshots, be mindful of storage requirements.

Contribution by Carlos A. Planchón.