Evaluations
Test the Browser Use agent on standardized benchmarks
Prerequisites
Browser Use uses proprietary/private test sets that must never be committed to Github and must be fetched through a authorized api request. Accessing these test sets requires an approved Browser Use account. There are currently no publicly available test sets, but some may be released in the future.
Get an Api Access Key
First, navigate to https://browser-use.tools and log in with an authorized browser use account.
Then, click the “Account” button at the top right of the page, and click the “Cycle New Key” button on that page.
Copy the resulting url and secret key into your .env
file. It should look like this:
Running Evaluations
First, ensure your file eval/service.py
is up to date.
Then run the file:
Configuring Evaluations
You can modify the evaluation by providing flags to the evaluation script. For instance:
The evaluations webpage has a convenient GUI for generating these commands. To use it, navigate to https://browser-use.tools/dashboard.
Then click the button “New Eval Run” on the left panel. This will open a interface with selectors, inputs, sliders, and switches.
Input your desired configuration into the interface and copy the resulting python command at the bottom. Then run this command as before.