Why screenshot MCPs cost 170x less than Playwright MCP (and when that matters)
Compare accessibility tree MCPs vs screenshot MCPs: token costs, when each is appropriate, and the math behind the 170x difference.
You're building an AI agent. You need it to interact with web pages. Two MCP approaches:
- Accessibility tree MCPs (like Playwright MCP) — Claude gets full DOM tree, can click buttons, fill forms
- Screenshot MCPs (like PageBolt MCP) — Claude sees a visual screenshot, can reason about layout
Which is cheaper to run?
Screenshot MCPs cost ~170x less per page.
$0.09 vs $15.30 for the same task. But there's a tradeoff. Each approach wins in different scenarios.
The token cost difference: accessibility trees vs screenshots
Accessibility tree (Playwright MCP)
A typical e-commerce page has 500–1000 nodes in the accessibility tree:
{
"nodes": [
{"id": 1, "role": "button", "text": "Add to Cart", "selector": "button.add-to-cart"},
{"id": 2, "role": "textbox", "name": "email", "value": ""},
...
// 500+ nodes for a typical e-commerce page
]
}
Claude needs to reason about this entire tree to click the right button. Based on community-reported data from r/Anthropic, a typical Playwright MCP session for 100 pages costs ~$15.30 in API costs — suggesting ~5000 tokens average per page interaction.
Screenshot MCP (PageBolt MCP)
When your agent uses a screenshot MCP, Claude sees the screenshot visually.
- Token cost per page: ~200 tokens (vision tokens for 6KB screenshot)
- Plus agent reasoning: ~200 tokens
- Total per page: ~400 tokens = $0.0012
- For 100 pages: $0.12
The math: 170x cost difference
| Metric | Playwright MCP | Screenshot MCP | Ratio |
|---|---|---|---|
| Tokens per page | ~5000 | ~400 | 12.5x |
| Cost per page | $0.15 | $0.0012 | 125x |
| Cost per 100 pages | $15.30 | $0.12 | 127x |
| Cost per 1000 pages | $153 | $1.20 | 127x |
Why the difference?
Accessibility trees are comprehensive but verbose: full DOM structure, ARIA attributes, form field values, focus state, parent-child relationships. Useful information — but text-heavy. Thousands of tokens.
Screenshots are visual and compact: single image (6–10KB), ~130–200 vision tokens. Claude can "see" everything at once with much lower overhead.
When to use each approach
Use Playwright MCP (accessibility trees) if:
- ✅ Complex form filling — Agent needs to find and fill 10+ fields precisely
- ✅ Interactive workflows — Multi-step sequences (click → fill → click → validate)
- ✅ Accessibility testing — Checking ARIA labels, semantic HTML
- ✅ Real-time state tracking — Need to validate form states, errors
- ✅ Low-frequency, high-value tasks — $15/query doesn't matter if it saves 2 hours of manual work
Use screenshot MCP (visual) if:
- ✅ Capture and monitoring — Regular screenshots for visual regression testing
- ✅ Read-only analysis — Agent just needs to "see" and reason about layout
- ✅ Batch operations — 100+ pages of screenshots (cost is critical)
- ✅ Documentation/reporting — Generate visual reports
Hybrid approach: Use both
Agent workflow:
1. Take screenshot to see page layout ($0.001)
2. If interaction needed:
- Switch to Playwright MCP
- Get accessibility tree ($0.15)
- Click button, fill form
3. Take screenshot to verify result ($0.001)
Cost: ~$0.15 for complex interaction (mostly the tree)
Benefit: Best of both worlds
Real-world example: Batch screenshot monitoring
Your team needs daily screenshots of 1000 competitor pricing pages.
With Playwright MCP: 1000 pages × $0.15 = $150/day = $4,500/month
With screenshot MCP: 1000 pages × $0.001 = $1/day = $30/month
Savings: $4,470/month. For this use case, screenshot is 150x cheaper and more appropriate.
The honest take
Playwright MCP is expensive but valuable if you need real interaction and token overhead doesn't matter for your use case.
Screenshot MCP is cheap and efficient if you need visual information, cost matters (batch operations), or you don't need to click/fill.
Don't pick based on cost alone. Pick based on what your agent actually needs to do.
Installing PageBolt MCP
npm install -g pagebolt-mcp
Configure in ~/.claude/claude_desktop_config.json:
{
"mcpServers": {
"pagebolt": {
"command": "pagebolt-mcp",
"env": {
"PAGEBOLT_API_KEY": "your-key-here"
}
}
}
}
Now your agent can call take_screenshot, generate_pdf, record_video, inspect_page, and run_sequence natively from Claude Desktop, Cursor, or Windsurf. Free tier: 100 requests/month.
Conclusion
Token economics matter. A 170x cost difference is real. But it's not a reason to dismiss Playwright MCP or over-rely on screenshots.
- Complex interaction? Playwright MCP
- Visual capture and analysis? Screenshot MCP
- Both? Combine them strategically
Try the 170x cheaper option
Free tier: 100 requests/month. No credit card. See if the cost advantage fits your agent's workflow.