SSE & Events

Real-time streaming events and DOM interaction protocols.

During agent execution, the server streams events to the client via Server-Sent Events (SSE). This page covers all event types, the Action Request protocol for DOM interactions, and Page State for visual understanding.

SSE Event Reference

SSEEvent Union Type

SSEEvent
type SSEEvent =
  | SSETextEvent
  | SSEToolCallEvent
  | SSEToolResultEvent
  | SSEActionRequestEvent
  | SSEErrorEvent
  | SSEDoneEvent;

Event Types

  • text — Streaming text token. Fields: content: string
  • tool_call — Agent is calling a tool. Fields: name: string, parameters: Record<string, any>
  • tool_result — Tool execution completed. Fields: name: string, result: ToolResult
  • action_request — Server requests DOM action from client. Fields: correlationId, action, parameters
  • error — Error occurred. Fields: error: string, fatal: boolean
  • done — Agent execution complete. No fields.

Event Interfaces

Interfaces
interface SSETextEvent {
  type: 'text';
  content: string;
}

interface SSEToolCallEvent {
  type: 'tool_call';
  name: string;                    // tool name
  parameters: Record<string, any>; // tool parameters
}

interface SSEToolResultEvent {
  type: 'tool_result';
  name: string;                    // tool name
  result: ToolResult;              // { success, result?, error? }
}

interface SSEActionRequestEvent {
  type: 'action_request';
  correlationId: string;           // unique ID to match request/result
  action: string;                  // 'click', 'scroll', 'navigate', etc.
  parameters: Record<string, any>;
}

interface SSEErrorEvent {
  type: 'error';
  error: string;
  fatal: boolean;                  // if true, agent execution stops
}

interface SSEDoneEvent {
  type: 'done';
}

Handling Events in onEvent

Event handling
useLensAgent({
  endpoint: '/api/agent/chat',
  onEvent: (event) => {
    switch (event.type) {
      case 'text':
        // Streaming text — messages state is updated automatically
        break;

      case 'tool_call':
        console.log(`Calling tool: ${event.name}`, event.parameters);
        // Show "loading" UI for this tool
        break;

      case 'tool_result':
        console.log(`Tool result: ${event.name}`, event.result);
        // Update tool card to "completed"
        break;

      case 'action_request':
        // Handled automatically by the hook (DOM actions)
        break;

      case 'error':
        console.error(event.error);
        if (event.fatal) {
          // Agent stopped — show error to user
        }
        break;

      case 'done':
        // Agent finished — cleanup if needed
        break;
    }
  },
});

Action Request Protocol

When the agent needs to interact with the user's page (click buttons, scroll, navigate), it uses the Action Request Protocol:

Action Request Flow
Server                              Client
  │                                    │
  │── action_request (via SSE) ───────►│
  │   { correlationId, action, params }│
  │                                    │ Execute DOM action
  │                                    │ Capture page state
  │◄── POST /action-result ───────────│
  │   { correlationId, result,         │
  │     pageState }                    │
  │                                    │
  │ (agent loop continues)             │

How It Works

  • The agent decides to call a DOM tool (e.g., click)
  • Server sends action_request event via SSE with a correlationId
  • Client's useLensAgent hook receives the event and: executes the DOM action, captures fresh page state, and POSTs the result to the action-result endpoint
  • Server receives the result, updates page context, and continues the agent loop

Custom Action Handler

By default, the hook uses the built-in WebUseTool for DOM actions. You can override this:

Custom handler
useLensAgent({
  endpoint: '/api/agent/chat',

  onActionRequest: async (action, params) => {
    // Custom DOM action handling
    if (action === 'click') {
      const element = document.querySelector(params.selector);
      element?.click();
      return { success: true, result: 'Clicked element' };
    }
    return { success: false, error: `Unknown action: ${action}` };
  },
});

Page State & Screenshots

Page state allows the agent to "see" the current page. This enables visual understanding and DOM tool execution.

PageState Interface

PageState
interface PageState {
  url: string;
  title: string;
  markdown: string;                      // Page content as markdown
  screenshot: string;                    // Base64 data URL
  actionableElements: ActionableElement[];
  timestamp: Date;
}

interface ActionableElement {
  id: string;
  type: 'button' | 'input' | 'link' | 'select' | 'textarea';
  selector: string;
  text?: string;
  placeholder?: string;
  description: string;
}

Providing Page State

getPageState
useLensAgent({
  endpoint: '/api/agent/chat',

  getPageState: async () => {
    // This function is called:
    // 1. Before each sendMessage (to send initial page context)
    // 2. After each action_request (to capture updated state)

    return {
      url: window.location.href,
      title: document.title,
      markdown: extractPageMarkdown(),       // your implementation
      screenshot: await captureScreenshot(),  // your implementation
      actionableElements: findActionableElements(), // your implementation
      timestamp: new Date(),
    };
  },
});

Sending Page State with a Message

sendMessage with pageState
await sendMessage('What do you see on this page?', {
  pageState: {
    url: window.location.href,
    title: document.title,
    markdown: '...',
    screenshot: 'data:image/png;base64,...',
    actionableElements: [],
    timestamp: new Date(),
  },
  currentUrl: window.location.href,
});

Note

The useLensAgent hook automatically handles action requests and page state capture. You only need to provide getPageState if you want the agent to have visual context.