How to Build Claude Agents That Can Prove What They Actually Saw on the Web
Add visual documentation to Claude agents — screenshot each tool call to verify the LLM actually saw the right data.
Claude API agents are powerful reasoners. But when you ask them to navigate the web, grab data, or verify page state, you hit a problem: you can't see what they actually saw.
Your agent called a tool. It got back HTML. Did it parse the right element? Was the page interactive or did it timeout? Did a banner block the content? The LLM reasoned its way through, but you're left guessing whether the visual reality matched the HTML response.
This is where visual proof changes everything. Screenshot each tool call. Give Claude a mirror of what actually rendered.
The Problem: Agents Flying Blind
Imagine a Claude agent tasked with:
- "Check if the signup form is active on example.com"
- "Grab the price from the pricing table"
- "Verify the login button changed after clicking it"
The agent's tools return raw HTML. But HTML isn't reality. CSS might hide elements. JavaScript might not have loaded. A modal might be blocking content. The agent reasons based on incomplete signals.
Result: False positives. Wrong decisions. Workflows that fail in production.
The Solution: Visual Documentation at Each Step
Add a screenshot to every tool call. When Claude asks the browser to fetch a page, capture what actually rendered. When it clicks a button, prove the page changed. The agent gets visual evidence, not just HTML.
Step 1: Add the PageBolt Screenshot Tool to Your Agent
import anthropic
import base64
import json
client = anthropic.Anthropic()
# Define the screenshot tool
tools = [
{
"name": "screenshot",
"description": "Take a screenshot of a URL and return base64 image data",
"input_schema": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "URL to screenshot"
},
"width": {
"type": "integer",
"description": "Viewport width (default 1280)",
"default": 1280
},
"height": {
"type": "integer",
"description": "Viewport height (default 720)",
"default": 720
}
},
"required": ["url"]
}
}
]
# Tool call handler
def take_screenshot(url, width=1280, height=720):
"""Call PageBolt API to capture screenshot"""
import urllib.request
import json
api_key = "YOUR_API_KEY" # Get from pagebolt.dev/dashboard
payload = json.dumps({
"url": url,
"width": width,
"height": height
}).encode('utf-8')
req = urllib.request.Request(
'https://pagebolt.dev/api/v1/screenshot',
data=payload,
headers={'x-api-key': api_key, 'Content-Type': 'application/json'},
method='POST'
)
with urllib.request.urlopen(req) as resp:
data = json.loads(resp.read())
return data # Returns {image: "base64_encoded_png", ...}
Step 2: Build the Agent Loop with Visual Checkpoints
def run_agent_with_visual_proof(user_query):
"""Agent that screenshots each action for verification"""
messages = [
{
"role": "user",
"content": user_query
}
]
screenshots = [] # Store visual proof
while True:
# Call Claude with tools enabled
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
messages=messages
)
# If Claude wants to stop
if response.stop_reason == "end_turn":
final_text = next(
(block.text for block in response.content if hasattr(block, 'text')),
None
)
return {
"result": final_text,
"screenshots": screenshots # Include visual proof
}
# Process tool calls
if response.stop_reason == "tool_use":
messages.append({
"role": "assistant",
"content": response.content
})
tool_results = []
for block in response.content:
if block.type == "tool_use":
tool_name = block.name
tool_input = block.input
if tool_name == "screenshot":
result = take_screenshot(**tool_input)
# Store screenshot for audit trail
screenshots.append({
"step": len(screenshots) + 1,
"url": tool_input["url"],
"image_base64": result["image"]
})
# Return to Claude: the image data so it can see what rendered
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": result["image"]
}
},
{
"type": "text",
"text": f"Screenshot captured at {tool_input['url']}"
}
]
})
messages.append({
"role": "user",
"content": tool_results
})
# Run it
result = run_agent_with_visual_proof(
"Check if the pricing page at https://example.com/pricing is loading correctly"
)
print("Agent response:", result["result"])
print(f"Captured {len(result['screenshots'])} screenshots for visual proof")
Step 3: Store and Audit
Each screenshot becomes part of your agent's audit trail. No more mystery about what the LLM saw.
import json
# Save the execution record
execution_record = {
"agent_task": "Verify pricing page state",
"timestamp": "2026-03-04T14:32:00Z",
"steps": result["screenshots"],
"agent_conclusion": result["result"]
}
with open("agent_execution_audit.json", "w") as f:
json.dump(execution_record, f, indent=2)
Now you can:
- Debug agent failures: "Oh, the agent didn't see the signup button because CSS hid it"
- Audit agent decisions: "Here's proof of what the agent saw when it made decision X"
- Build reliable workflows: "If the screenshot shows the error state, take this path instead"
Why This Matters
Claude agents are shipping now in production. Slack workflows. Customer support bots. Automation platforms. But without visual proof, they're black boxes.
Competing solutions (self-hosted Puppeteer, Selenium) give you the screenshot tool — but not the audit trail. You build the infrastructure, patch the libraries, debug the timeouts, manage the Chrome instances.
PageBolt gives you both: one API endpoint, instant visual proof, permanent audit history. Your agent sees it. You see it. No mysteries.
Try PageBolt free — 100 requests/month, no credit card needed. →
Your AI agents will thank you.
Give your Claude agents visual memory
One API endpoint. Screenshots at every step. No more guessing what the LLM saw. Free tier: 100 requests/month.
Get API Key — Free