Browser Automation¶
JarvisCore's BrowserSubAgent drives a real Chromium browser via Playwright. It is activated when the Kernel routes a task to the browser role, triggered automatically by keyword classification or by setting default_kernel_role = "browser" on your AutoAgent.
The browser subagent is not a replacement for web search. Use it only when the target page requires JavaScript execution, cookie-based authentication, interactive UI automation, or form submission. For static content and API-based research, the ResearcherSubAgent's web_search and read_url tools are faster and cheaper.
Installation¶
Playwright is not bundled with JarvisCore. Install it separately:
JarvisCore imports Playwright lazily, the framework loads and runs correctly without it. When Playwright is not installed and a task is routed to the browser role, the sub-agent returns a clear error message rather than crashing.
Enabling browser automation¶
Set BROWSER_ENABLED=true in your .env. Without this, the browser role is never selected by the Kernel's task classifier, even if Playwright is installed.
Also set BROWSER_MODEL to a CUA or multimodal model. Without it, the framework falls back to TASK_MODEL_STANDARD, which may be a text-only model that cannot interpret screenshots.
BROWSER_ENABLED=true
# CUA model for browser automation (strongly recommended)
# Gemini: BROWSER_MODEL=gemini-2.5-computer-use
# OpenAI: BROWSER_MODEL=gpt-5.4-mini
# Fallback (multimodal, not CUA): BROWSER_MODEL=gpt-4o or gemini-2.5-flash
BROWSER_MODEL=gemini-2.5-computer-use
# Defaults to true (headless). Set false to see the browser window during development.
BROWSER_HEADLESS=true
The kernel reads browser_headless from settings and passes it to BrowserSubAgent at instantiation. The default viewport is 1280x720, hardcoded in BrowserSubAgent.__init__() and not currently configurable via environment variable.
How routing works¶
The Kernel classifies tasks into sub-agent roles using keyword sets. The browser role has highest priority and is checked before researcher and communicator. Any task whose text contains one of these keywords is routed to the browser:
browser, click, navigate, screenshot, fill form, login to, log in to,
scrape, automate, playwright, selenium, headless, web automation, interact with
You can also force browser routing without relying on keywords by declaring it on your agent class:
class MyAgent(AutoAgent):
role = "web-scraper"
capabilities = ["scraping"]
system_prompt = "..."
default_kernel_role = "browser" # always routes to BrowserSubAgent
Tools available to the LLM¶
The browser sub-agent registers the following tools. The LLM calls them by emitting TOOL: <name> in its OODA loop turns.
Navigation¶
| Tool | Description | Key parameters |
|---|---|---|
navigate |
Go to a URL and wait for load | url, wait_for (networkidle|domcontentloaded|load) |
close_page |
Close current page, open a fresh one | , |
Inspection¶
| Tool | Description | Key parameters |
|---|---|---|
get_text |
Extract text from an element or the full page | selector (empty = full page), max_chars (default 5000) |
get_attribute |
Get an attribute value from an element | selector, attribute |
get_links |
Extract all <a> links from the page |
selector (optional scope), max (default 50) |
get_cookies |
Get cookies for the current page, returns name, domain, path only (httpOnly values are not exposed) | , |
screenshot |
Take a PNG screenshot, returned as base64 in the LLM context | full_page (default false) |
Interaction¶
| Tool | Description | Key parameters |
|---|---|---|
click |
Click an element by CSS selector or visible text | selector (CSS), text (text match, used if selector is empty) |
type_text |
Type text into an input, clears field first by default | selector, text, clear_first (default true), delay_ms (default 50) |
fill_form |
Fill multiple form fields in one call | fields: [{"selector": "...", "value": "..."}] |
select_option |
Select a <select> dropdown value by value or label |
selector, value |
scroll |
Scroll the page | direction (down|up|top|bottom), pixels (default 500) |
hover |
Hover over an element | selector |
Waiting¶
| Tool | Description | Key parameters |
|---|---|---|
wait_for |
Wait for an element to reach a state | selector, timeout_ms (default 10000), state (visible|hidden|attached|detached) |
JavaScript¶
| Tool | Description | Key parameters |
|---|---|---|
evaluate |
Execute a JavaScript expression on the page and return the result | script |
Session lifecycle¶
One Chromium browser is launched per run() call, not per task dispatch. If the Kernel dispatches the same workflow step to BrowserSubAgent multiple times (e.g. two browser-classified steps in the same workflow), each dispatch gets its own browser session.
Within a single run():
- Pages are reused, so cookies and auth state persist across tool calls
- close_page opens a fresh page but keeps the same browser context (cookies survive)
- The browser is closed unconditionally when run() exits via the finally block of _post_run_hook()
The browser launch arguments disable sandbox and automation flags to reduce detection:
--no-sandbox
--disable-setuid-sandbox
--disable-dev-shm-usage
--disable-blink-features=AutomationControlled
The user agent is set to a realistic Chrome string to avoid bot detection on common sites.
What tasks suit BrowserSubAgent¶
Good fits: - Logging into a web application and extracting account data - Filling and submitting forms that don't have an API - Scraping JavaScript-rendered content (SPAs, dashboards) - Automating multi-step UI workflows - Downloading files via browser interactions
Poor fits, use the Researcher instead:
- Reading static HTML pages (use read_url)
- Querying public APIs (use web_search or a Nexus atom)
- Fetching RSS feeds or structured data endpoints
Example agent¶
from jarviscore import AutoAgent
class DashboardScraper(AutoAgent):
role = "scraper"
capabilities = ["web-scraping", "dashboard"]
default_kernel_role = "browser"
system_prompt = """
You are a web automation specialist. Your task is to log into dashboards and
extract structured data. Always:
1. Call navigate() first.
2. Use wait_for() to confirm interactive elements are present before clicking.
3. Take a screenshot if you are unsure about page state.
4. Store all extracted data in a variable named `result`.
"""
from jarviscore import Mesh
from agents.dashboard_scraper import DashboardScraper
mesh = Mesh()
mesh.add(DashboardScraper)
await mesh.start()
results = await mesh.workflow("dashboard-job", [
{
"id": "extract",
"agent": "scraper",
"task": "Navigate to https://app.example.com/login, log in with username 'admin' "
"and password from the DASHBOARD_PASSWORD env var, then extract the "
"monthly revenue figure from the summary panel.",
}
])
Connecting to an existing browser (CDP)¶
The BrowserController in jarviscore.browser supports connecting to an already-running Chromium instance via Chrome DevTools Protocol. This is useful for testing against a persistent browser profile or for debugging. The BrowserSubAgent itself always launches its own browser, CDP connection is available through BrowserController directly if you need it for custom tooling.
Troubleshooting¶
| Symptom | Likely cause | Fix |
|---|---|---|
| Task not routed to browser | BROWSER_ENABLED not set |
Add BROWSER_ENABLED=true to .env |
Playwright not installed error in tool result |
Playwright not installed | pip install playwright && playwright install chromium |
Browser not initialized error from tools |
Playwright launched but Chromium failed | Check system dependencies; try running playwright install chromium again |
| Page load timeout | networkidle timeout (30s default) |
Add wait_for: "domcontentloaded" in the navigate call for slow pages |
| Element not found | Selector wrong or page not fully loaded | Use wait_for before click or type_text; use screenshot to inspect page state |
Further Reading¶
- AutoAgent Guide, execution budgets for the browser role (60k thinking + 60k action tokens, 5 min wall clock, 28-turn fuse)
- Internet Search, the researcher's alternative for pages that don't need a real browser
- Model Routing, CUA and multimodal model requirements for the browser tier