How to Give Your AI Agent Eyes: Screenshot and Visual Verification via API
AI agents are text-only. Give them visual awareness with screenshots and visual verification. Real patterns for LangChain, CrewAI, Claude API.
Your AI agent is blind.
You tell it: "Click the submit button and confirm the form was accepted."
The agent makes the API call. Gets back status 200. Continues to the next task.
But you have no idea what actually happened. Did the form submit? Is the page still showing an error? Did the agent hallucinate the success?
AI agents need eyes. They need to see what happened after every action.
Without visual verification, you're building on faith. With it, you have proof.
The Problem: Text-Only Agents Can't Verify Actions
Here's what typical agent workflows look like:
# Agent calls browser tool
response = tool.click_button(selector="#submit")
# Response: { "success": true, "status": 200 }
# Agent has NO IDEA:
# - Did the page actually change?
# - Is there an error message visible?
# - Did the form data actually save?
# - Is the agent looking at the right page?
# Agent just assumes success and moves on
agent.next_task()
Result: Agents make wrong decisions based on incomplete information. They hallucinate success. They miss errors that humans would catch instantly.
You need visual verification loops — after every action, the agent sees what happened.
The Solution: Three Visual Verification Patterns
Pattern 1: Post-Action Verification Screenshot
After your agent clicks a button, take a screenshot. Extract key information from the image. Decide what to do next.
from anthropic import Anthropic
import requests
import base64
client = Anthropic()
PAGEBOLT_API_KEY = "YOUR_API_KEY"
PAGEBOLT_BASE_URL = "https://pagebolt.dev/api/v1"
def agent_interact_with_verification(url, action_description, initial_prompt):
"""
AI agent interacts with a website and uses visual verification
to understand what happened.
"""
# Step 1: Take initial screenshot
response = requests.post(
f"{PAGEBOLT_BASE_URL}/screenshot",
json={"url": url},
headers={"x-api-key": PAGEBOLT_API_KEY},
timeout=30
)
initial_screenshot = base64.standard_b64encode(response.content).decode()
# Step 2: Ask Claude what it sees
message = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": initial_screenshot,
},
},
{
"type": "text",
"text": initial_prompt
}
],
}
],
)
agent_decision = message.content[0].text
print(f"Agent sees: {agent_decision}")
# Step 3: Execute action (simulate browser interaction)
# In production, you'd call your actual browser automation tool
print(f"Executing: {action_description}")
# Step 4: Verify the action with another screenshot
response = requests.post(
f"{PAGEBOLT_BASE_URL}/screenshot",
json={"url": url}, # URL after action in real scenario
headers={"x-api-key": PAGEBOLT_API_KEY},
timeout=30
)
verification_screenshot = base64.standard_b64encode(response.content).decode()
# Step 5: Ask Claude: Did the action work?
verification = client.messages.create(
model="claude-opus-4-6",
max_tokens=512,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": verification_screenshot,
},
},
{
"type": "text",
"text": f"I just performed this action: {action_description}\n\nDid it succeed? What changed? Any errors?"
}
],
}
],
)
result = verification.content[0].text
print(f"Verification result: {result}")
return {
"initial_assessment": agent_decision,
"action": action_description,
"verification": result,
"success": "success" in result.lower() or "worked" in result.lower()
}
# Usage
result = agent_interact_with_verification(
url="https://example.com/checkout",
action_description="Clicked the 'Complete Purchase' button",
initial_prompt="What do you see on this page? Is the checkout form ready to submit?"
)
print(f"\nFinal result: {result}")
Pattern 2: CSS Selector Discovery with /inspect
Don't guess selectors. Use /inspect to reliably find page elements before interacting with them.
import requests
import json
def find_elements_reliably(url):
"""
Use PageBolt /inspect endpoint to discover page structure
and get reliable CSS selectors without guessing.
"""
response = requests.post(
"https://pagebolt.dev/api/v1/inspect",
json={"url": url},
headers={"x-api-key": PAGEBOLT_API_KEY}
)
page_structure = response.json()
# page_structure contains:
# {
# "buttons": [
# { "text": "Submit", "selector": "#submit-btn-123", "visible": true },
# { "text": "Cancel", "selector": ".btn-cancel", "visible": true }
# ],
# "forms": [...],
# "inputs": [...],
# "headings": [...]
# }
return page_structure
def agent_action_with_selector_discovery(url, action_goal):
"""
Agent asks: "What elements exist on this page?"
Uses /inspect to get all clickable elements with their selectors.
Chooses the right one based on the goal.
"""
# Get page structure
page_map = find_elements_reliably(url)
# Convert to natural language for Claude
page_description = json.dumps(page_map, indent=2)
decision = client.messages.create(
model="claude-opus-4-6",
max_tokens=256,
messages=[
{
"role": "user",
"content": f"""
Here's the page structure:
{page_description}
Goal: {action_goal}
Which element should I interact with? Respond with ONLY the CSS selector.
"""
}
]
)
selector = decision.content[0].text.strip()
print(f"Agent chose selector: {selector}")
# Now the agent can click with CONFIDENCE — the selector is proven to exist
return selector
# Usage
selector = agent_action_with_selector_discovery(
url="https://example.com",
action_goal="Click the button that says 'Add to Cart'"
)
Pattern 3: Video Recording as Compliance Audit Trail
For regulated industries (finance, healthcare, legal), record agent sessions as proof of what happened.
import requests
def record_agent_session_as_video(actions):
"""
Record a full agent session (multiple clicks, form fills, etc.)
as a video. Perfect for compliance audits: prove to regulators
exactly what your agent did.
"""
# Build step sequence
steps = [
{"action": "navigate", "url": "https://example.com/account"},
{"action": "wait", "ms": 2000},
{"action": "click", "selector": "#view-history"},
{"action": "wait", "ms": 1000},
{"action": "screenshot", "name": "history-page"},
{"action": "click", "selector": "#export-btn"},
{"action": "wait", "ms": 2000},
{"action": "screenshot", "name": "export-complete"}
]
# Record to video
response = requests.post(
"https://pagebolt.dev/api/v1/record",
json={
"steps": steps,
"format": "mp4",
"frame": {"enabled": True, "style": "macos"},
"audioGuide": {
"enabled": True,
"script": "Agent navigating to account history. {{1}} Clicking export button. {{2}} Export complete."
}
},
headers={"x-api-key": PAGEBOLT_API_KEY}
)
video_url = response.json().get("video_url")
return video_url
# Usage
audit_video = record_agent_session_as_video([
{"url": "account-page", "action": "view-history"},
{"button": "export", "action": "click"}
])
print(f"Compliance video: {audit_video}")
Real-World: AI Agent Verification Loop
Here's what a production agent verification loop looks like end-to-end:
class VerifiedAgent:
def __init__(self, api_key):
self.api_key = api_key
self.pagebolt_api = "https://pagebolt.dev/api/v1"
self.client = Anthropic()
def verify_action(self, url, action):
"""Take screenshot, show to Claude, verify success."""
response = requests.post(
f"{self.pagebolt_api}/screenshot",
json={"url": url},
headers={"x-api-key": self.api_key}
)
screenshot = base64.standard_b64encode(response.content).decode()
verification = self.client.messages.create(
model="claude-opus-4-6",
max_tokens=256,
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": screenshot}},
{"type": "text", "text": f"Action: {action}. Did it succeed? Report status."}
]
}]
)
return verification.content[0].text
def discover_elements(self, url):
"""Use /inspect to find clickable elements."""
response = requests.post(
f"{self.pagebolt_api}/inspect",
json={"url": url},
headers={"x-api-key": self.api_key}
)
return response.json()
def execute_with_verification(self, url, goal, max_steps=5):
"""Execute a goal with verification at each step."""
current_url = url
step = 0
while step < max_steps:
# Discover what's on the page
elements = self.discover_elements(current_url)
# Ask Claude what to do next
plan = self.client.messages.create(
model="claude-opus-4-6",
max_tokens=256,
messages=[{
"role": "user",
"content": f"Goal: {goal}\n\nAvailable elements: {json.dumps(elements)}\n\nWhat's the next action?"
}]
)
action = plan.content[0].text
print(f"Step {step}: {action}")
# Verify the action worked
result = self.verify_action(current_url, action)
print(f"Result: {result}")
if "success" in result.lower() or "complete" in result.lower():
print("Goal achieved!")
return True
step += 1
return False
# Usage
agent = VerifiedAgent(api_key="YOUR_API_KEY")
success = agent.execute_with_verification(
url="https://example.com",
goal="Fill out the feedback form and submit it"
)
Why This Matters
| Scenario | Without Verification | With Verification |
|---|---|---|
| Agent clicks button | Assumes it worked | Takes screenshot, confirms |
| Form submission | Hopes page updated | Sees success message or error |
| Hallucination detection | Agent continues blindly | Agent recognizes mistake, retries |
| Audit trail | "The agent ran" (unproven) | Video proof of every action |
| Debugging failures | Guess what went wrong | Watch the video, see exact issue |
Getting Started
- Sign up: pagebolt.dev — 100 free requests/month
- Get API key: Copy from dashboard
- Choose pattern: Post-action verification, /inspect, or video recording
- Integrate: Copy the code example above
- Deploy: Your agents now have visual awareness
Your AI agents will go from "hoping for the best" to "seeing what actually happened."
Give your agents eyes — free
100 requests/month, no credit card required. Add visual verification to your AI agent in minutes.
Start free →