Giving Your AI Agent Eyes: Web Capture for Autonomous Agents

An agent that can only parse HTML misses charts, canvas elements, CSS-rendered UI, and anything behind a login or bot wall. Web capture via API closes that gap. Here's how to wire it into an agent loop.

The core pattern

Most agent frameworks call tools as functions. Wire the PageBolt API as a tool:

// Tool definition for an OpenAI-compatible agent
const tools = [{
  type: 'function',
  function: {
    name: 'capture_screenshot',
    description: 'Take a screenshot of a URL and return the image for visual analysis.',
    parameters: {
      type: 'object',
      properties: {
        url: { type: 'string', description: 'The URL to screenshot' },
        fullPage: { type: 'boolean', description: 'Capture full scrollable page' },
        stealth: { type: 'boolean', description: 'Bypass bot detection' }
      },
      required: ['url']
    }
  }
}];

// Tool handler
async function capture_screenshot({ url, fullPage = true, stealth = false }) {
  const res = await fetch('https://pagebolt.dev/api/v1/screenshot', {
    method: 'POST',
    headers: { 'x-api-key': process.env.PAGEBOLT_API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ url, fullPage, stealth, blockBanners: true })
  });
  const buffer = Buffer.from(await res.arrayBuffer());
  // Return as base64 for vision model consumption
  return { image_base64: buffer.toString('base64'), format: 'png' };
}

The agent decides when to call capture_screenshot. You pass the base64 image to a vision model (GPT-4o, Claude 3.5 Sonnet) for analysis — "what changed on this page?", "is this form filled correctly?", "what is the price shown here?"

Inspect for structured element data

For agents that need to interact with a page (click buttons, fill forms), the inspect endpoint returns structured element data without requiring a live browser:

const res = await fetch('https://pagebolt.dev/api/v1/inspect', {
  method: 'POST',
  headers: { 'x-api-key': process.env.PAGEBOLT_API_KEY, 'Content-Type': 'application/json' },
  body: JSON.stringify({ url, stealth: true })
});
const { elements } = await res.json();
// elements: [{ selector, text, tag, type, href, position }]
// Agent can reason about elements and use them in sequence steps

Feed elements to your LLM to let it reason about page structure: "the submit button has selector #checkout-btn, I should click it next."

Record what the agent did

For auditability, record the agent's browser session as a narrated video:

async function record_agent_session(steps, taskDescription) {
  const res = await fetch('https://pagebolt.dev/api/v1/video', {
    method: 'POST',
    headers: { 'x-api-key': process.env.PAGEBOLT_API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({
      steps,  // the steps your agent decided to execute
      audioGuide: {
        enabled: true,
        voice: 'nova',
        script: `Agent task: ${taskDescription}. {{1}} Starting execution.`
      },
      pace: 'normal'
    })
  });
  return Buffer.from(await res.arrayBuffer()); // MP4 of the session
}

This gives you a human-readable audit trail: a narrated video of exactly what the agent did, step by step. Critical for debugging agent behavior and demonstrating to stakeholders that the automation worked correctly.

Using the MCP server in Claude Desktop / Cursor

If you're building with Claude-powered agents, the PageBolt MCP server exposes all of this as native tools — no API wiring needed:

npm install -g pagebolt-mcp

{
  "mcpServers": {
    "pagebolt": {
      "command": "pagebolt-mcp",
      "env": { "PAGEBOLT_API_KEY": "your_key_here" }
    }
  }
}

Claude can then call take_screenshot, record_video, inspect_page directly in its tool loop — with SSRF protection and rate limiting built in at the infrastructure level.

Giving Your AI Agent Eyes: Web Capture for Autonomous Agents

The core pattern

Inspect for structured element data

Record what the agent did

Using the MCP server in Claude Desktop / Cursor

100 requests/month, no credit card