SSE & Events

Real-time streaming events and DOM interaction protocols.

During agent execution, the server streams events to the client via Server-Sent Events (SSE). This page covers all event types, the Action Request protocol for DOM interactions, and Page State for visual understanding.

SSE Event Reference

SSEEvent Union Type

SSEEvent

type SSEEvent =
  | SSETextEvent
  | SSEToolCallEvent
  | SSEToolResultEvent
  | SSEActionRequestEvent
  | SSEErrorEvent
  | SSEDoneEvent;

Event Types

text — Streaming text token. Fields: content: string
tool_call — Agent is calling a tool. Fields: name: string, parameters: Record<string, any>
tool_result — Tool execution completed. Fields: name: string, result: ToolResult
action_request — Server requests DOM action from client. Fields: correlationId, action, parameters
error — Error occurred. Fields: error: string, fatal: boolean
done — Agent execution complete. No fields.

Event Interfaces

Interfaces

interface SSETextEvent {
  type: 'text';
  content: string;
}

interface SSEToolCallEvent {
  type: 'tool_call';
  name: string;                    // tool name
  parameters: Record<string, any>; // tool parameters
}

interface SSEToolResultEvent {
  type: 'tool_result';
  name: string;                    // tool name
  result: ToolResult;              // { success, result?, error? }
}

interface SSEActionRequestEvent {
  type: 'action_request';
  correlationId: string;           // unique ID to match request/result
  action: string;                  // 'click', 'scroll', 'navigate', etc.
  parameters: Record<string, any>;
}

interface SSEErrorEvent {
  type: 'error';
  error: string;
  fatal: boolean;                  // if true, agent execution stops
}

interface SSEDoneEvent {
  type: 'done';
}

Handling Events in onEvent

Event handling

useLensAgent({
  endpoint: '/api/lens/agent/chat',
  onEvent: (event) => {
    switch (event.type) {
      case 'text':
        // Streaming text — messages state is updated automatically
        break;

      case 'tool_call':
        console.log(`Calling tool: ${event.name}`, event.parameters);
        // Show "loading" UI for this tool
        break;

      case 'tool_result':
        console.log(`Tool result: ${event.name}`, event.result);
        // Update tool card to "completed"
        break;

      case 'action_request':
        // Handled automatically by the hook (DOM actions)
        break;

      case 'error':
        console.error(event.error);
        if (event.fatal) {
          // Agent stopped — show error to user
        }
        break;

      case 'done':
        // Agent finished — cleanup if needed
        break;
    }
  },
});

Action Request Protocol

When the agent needs to interact with the user's page (click buttons, scroll, navigate), it uses the Action Request Protocol:

Action Request Flow

Server                              Client
  │                                    │
  │── action_request (via SSE) ───────►│
  │   { correlationId, action, params }│
  │                                    │ Execute DOM action
  │                                    │ Capture page state
  │◄── POST /action-result ───────────│
  │   { correlationId, result,         │
  │     pageState }                    │
  │                                    │
  │ (agent loop continues)             │

How It Works

The agent decides to call a DOM tool (e.g., click)
Server sends action_request event via SSE with a correlationId
Client's useLensAgent hook receives the event and: executes the DOM action, captures fresh page state, and POSTs the result to the action-result endpoint
Server receives the result, updates page context, and continues the agent loop

Custom Action Handler

By default, the hook uses the built-in WebUseTool for DOM actions. You can override this:

Custom handler

useLensAgent({
  endpoint: '/api/lens/agent/chat',

  onActionRequest: async (action, params) => {
    // Custom DOM action handling
    if (action === 'click') {
      const element = document.querySelector(params.selector);
      element?.click();
      return { success: true, result: 'Clicked element' };
    }
    return { success: false, error: `Unknown action: ${action}` };
  },
});

Page State & Screenshots

Page state allows the agent to "see" the current page. This enables visual understanding and DOM tool execution.

PageState Interface

PageState

interface PageState {
  url: string;
  title: string;
  markdown: string;                      // Page content as markdown
  screenshot: string;                    // Base64 data URL
  actionableElements: ActionableElement[];
  timestamp: Date;
}

interface ActionableElement {
  id: string;
  type: 'button' | 'input' | 'link' | 'select' | 'textarea';
  selector: string;
  text?: string;
  placeholder?: string;
  description: string;
}

Providing Page State

getPageState

useLensAgent({
  endpoint: '/api/lens/agent/chat',

  getPageState: async () => {
    // This function is called:
    // 1. Before each sendMessage (to send initial page context)
    // 2. After each action_request (to capture updated state)

    return {
      url: window.location.href,
      title: document.title,
      markdown: extractPageMarkdown(),       // your implementation
      screenshot: await captureScreenshot(),  // your implementation
      actionableElements: findActionableElements(), // your implementation
      timestamp: new Date(),
    };
  },
});

Sending Page State with a Message

sendMessage with pageState

await sendMessage('What do you see on this page?', {
  pageState: {
    url: window.location.href,
    title: document.title,
    markdown: '...',
    screenshot: 'data:image/png;base64,...',
    actionableElements: [],
    timestamp: new Date(),
  },
  currentUrl: window.location.href,
});

Note

The useLensAgent hook automatically handles action requests and page state capture. You only need to provide getPageState if you want the agent to have visual context.