Prerequisites

Browser Use uses proprietary/private test sets that must never be committed to Github and must be fetched through a authorized api request. Accessing these test sets requires an approved Browser Use account. There are currently no publicly available test sets, but some may be released in the future.

Get an Api Access Key

First, navigate to https://browser-use.tools and log in with an authorized browser use account.

Then, click the “Account” button at the top right of the page, and click the “Cycle New Key” button on that page.

Copy the resulting url and secret key into your .env file. It should look like this:

.env
EVALUATION_TOOL_URL= ...
EVALUATION_TOOL_SECRET_KEY= ...

Running Evaluations

First, ensure your file eval/service.py is up to date.

Then run the file:

python eval/service.py

Configuring Evaluations

You can modify the evaluation by providing flags to the evaluation script. For instance:

python eval/service.py --parallel_runs 5 --parallel_evaluations 5 --max-steps 25 --start 0 --end 100 --model gpt-4o

The evaluations webpage has a convenient GUI for generating these commands. To use it, navigate to https://browser-use.tools/dashboard.

Then click the button “New Eval Run” on the left panel. This will open a interface with selectors, inputs, sliders, and switches.

Input your desired configuration into the interface and copy the resulting python command at the bottom. Then run this command as before.