Overview

This tutorial demonstrates how to:

  • Set up browser automation with Julep
  • Navigate web pages programmatically
  • Execute browser actions like clicking and typing
  • Process visual feedback through screenshots
  • Create goal-oriented browser automation tasks

Task Structure

Let’s break down the task into its core components:

1. Input Schema

First, we define what inputs our task expects:

input_schema:
  type: object
  properties:
    goal:
      type: string
    agent_id:
      type: string
      description: The id of the agent to use for the browser automation
  required:
    - goal
    - agent_id

This schema specifies that our task expects a goal string describing what the browser automation should accomplish.

2. Tools Configuration

Next, we define the external tools our task will use:

tools:
- name: create_browserbase_session
  type: integration
  integration:
    provider: browserbase
    method: create_session
    setup:
      api_key: YOUR_BROWSERBASE_API_KEY
      project_id: YOUR_PROJECT_ID

- name: get_session_view_urls
  type: integration
  integration:
    provider: browserbase
    method: get_live_urls

- name: perform_browser_action
  type: integration
  integration:
    provider: remote_browser
    method: perform_action
    setup:
      width: 1024
      height: 768

- name: create_julep_session
  type: system
  system:
    resource: session
    operation: create

- name: session_chat
  type: system
  system:
    resource: session
    operation: chat

3. Main Workflow Steps

1

Create Julep Session

- tool: create_julep_session
  arguments:
    agent: str(agent.id)
    situation: "'The environment is a browser'"
    recall: 'False'

This step initializes a new Julep session for the AI agent. The session serves as a container for the conversation history and enables the agent to maintain context throughout the interaction.

2

Store Session ID

- evaluate:
    julep_session_id: _.id

After creating the session, we store its unique identifier for future reference.

3

Create Browser Session

- tool: create_browserbase_session
  arguments:
    project_id: YOUR_PROJECT_ID

This step establishes a new browser session using BrowserBase. It creates an isolated, headless Chrome browser instance that the agent can control.

4

Store Browser Session Info

- evaluate:
    browser_session_id: _.id
    connect_url: _.connect_url

We store both the browser session ID and connect URL in a single evaluation step.

5

Get Session View URLs

- tool: get_session_view_urls
  arguments:
    id: _.browser_session_id

This step retrieves various URLs associated with the browser session, including debugging interfaces and live view URLs.

6

Store Debugger URL

- evaluate:
    debugger_url: _.urls.debuggerUrl

We specifically store the debugger URL, which provides access to Chrome DevTools Protocol debugging interface.

7

Initial Navigation

- tool: perform_browser_action
  arguments:
    connect_url: outputs[3].connect_url
    action: "'navigate'"
    text: "'https://www.google.com'"

This step navigates to Google’s homepage to avoid sending a blank screenshot when computer use starts.

8

Start Browser Workflow

- workflow: run_browser
  arguments:
    julep_session_id: outputs[1].julep_session_id
    cdp_url: outputs[3].connect_url
    messages:
    - role: "'user'"
      content: |-
        """
        <SYSTEM_CAPABILITY>
        * You are utilising a headless chrome browser to interact with the internet.
        * You can use the computer tool to interact with the browser.
        * You have access to only the browser.
        * You are already inside the browser.
        * You can't open new tabs or windows.
        * For now, rely on screenshots as the only way to see the browser.
        * You can't don't have access to the browser's UI.
        * YOU CANNOT WRITE TO THE SEARCH BAR OF THE BROWSER.
        </SYSTEM_CAPABILITY>
        <GOAL>
        *""" + inputs[0].goal + NEWLINE + "</GOAL>"

Finally, we initiate the interactive browser workflow with system capabilities and user goal.

Run Browser Workflow

The run_browser workflow is a crucial component that handles the interactive browser automation. It consists of three main parts:

1

Agent Interaction

- tool: session_chat
  arguments:
    session_id: _.julep_session_id
    messages: _.messages
    recall: 'False'

This step engages the AI agent in conversation, allowing it to:

  • Process and understand the user’s goal
  • Plan appropriate browser actions
  • Generate responses based on the current browser state
  • Make decisions about next steps
1

Action Execution

- foreach:
    in: _.tool_calls
    do:
      tool: perform_browser_action
      arguments:
        connect_url: inputs[0].cdp_url
        action: _.action
        text: _.get('text')
        coordinate: _.get('coordinate')

This component:

  • Iterates through planned actions sequentially
  • Executes browser commands (navigation, clicking, typing)
  • Handles different types of interactions (text input, mouse clicks)
  • Captures screenshots for visual feedback
1

Goal Evaluation

- workflow: check_goal_status
  arguments:
    messages: _.messages
    julep_session_id: _.julep_session_id
    cdp_url: _.cdp_url

This final part:

  • Assesses progress toward the user’s goal
  • Determines if additional actions are needed
  • Maintains conversation context
  • Decides whether to continue or conclude the workflow

Check Goal Status Workflow

The check_goal_status workflow is a recursive component that ensures continuous operation until the goal is achieved:

1

Check Goal Status

check_goal_status:
- if: len(_.messages) > 0
  then:
    workflow: run_browser
    arguments:
      messages: _.messages
      julep_session_id: _.julep_session_id
      cdp_url: _.cdp_url
      workflow_label: "'run_browser'"

This workflow:

  • Checks if there are any messages to process (len(_.messages) > 0)
  • If messages exist, recursively calls the run_browser workflow
  • Passes along the current session context and connection details
  • Maintains the conversation flow until the goal is achieved
  • Automatically terminates when no more messages need processing

This recursive pattern ensures that the browser automation continues until either:

  • The goal is successfully achieved
  • No more actions are needed
  • An error occurs that prevents further progress

Example Usage

Here’s how to use this task with the Julep SDK:

from julep import Client

client = Client(api_key=JULEP_API_KEY)

execution = client.executions.create(
    task_id=TASK_UUID,
    input={
        "agent_id": "YOUR_AGENT_ID",
        "goal": "Search for recent news about artificial intelligence"
    }
)

Key Features

  • Browser Automation: Performs web interactions like navigation, clicking, and typing
  • Visual Feedback: Captures screenshots to verify actions and understand page state
  • Goal-Oriented: Continues executing actions until the user’s goal is achieved
  • Secure Sessions: Uses BrowserBase for isolated browser instances
  • Interactive Workflow: Uses run_browser subworkflow for continuous interaction

Next Steps

To try this task yourself:

  1. Get your API keys for BrowserBase
  2. Create a new agent using the Julep SDK
  3. Create and execute the task with your desired goal
  4. Experiment with different browser automation scenarios

For more examples and task patterns, check out our other cookbooks.