Headless browser API: Self-hosted vs managed, when each makes sense
Compare self-hosted headless browsers (Puppeteer, Playwright, Selenium) vs hosted APIs. Cost, complexity, and when to choose each.
You need to automate browser tasks — screenshots, PDFs, form fills, testing. You have two paths:
- Self-hosted — Run Puppeteer/Playwright on your servers
- Hosted API — Call a managed headless browser service
Each has tradeoffs. Most teams pick wrong and regret it.
The self-hosted trap
Self-hosting a headless browser sounds simple: npm install puppeteer, write a script, deploy. In reality:
// This looks easy...
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
const screenshot = await page.screenshot();
But production is messy.
Hidden costs of self-hosting
Infrastructure
- Each browser instance needs 300–500MB RAM
- 10 concurrent requests = 3–5GB RAM minimum
- Add margin for spikes = you need 8GB+ instance
- EC2 instance: $50–150/month just for browser capacity
Orchestration
- Browser pools fail silently
- Connection timeouts need retry logic
- Memory leaks require process recycling
- You're now managing lifecycle, health checks, auto-restart
Scaling
- Vertical scaling hits ceiling (instance size limit)
- Horizontal scaling adds complexity (load balancing, session affinity)
- 100 concurrent users = multiple servers, Kubernetes cluster management
Maintenance
- Chrome versions change → tests break
- Security patches → deployments
- Dependency updates → regression testing
- 5+ hours/month firefighting
Real cost (often hidden):
- Infrastructure: $50–300/month
- DevOps time: 5–10 hours/month (~$1,000–2,000)
- Opportunity cost: time spent firefighting vs building features
- Total: $1,500–2,500/month (in most companies' effective hourly rate)
When self-hosting makes sense
Self-hosting is worth it if:
- ✅ You're running 1,000+ screenshots/day (economies of scale)
- ✅ You have a dedicated DevOps engineer anyway
- ✅ You need sub-millisecond response times (not possible over HTTP)
- ✅ You have strict data residency requirements (EU data never leaves EU)
- ✅ Your use case is internal-only (no user-facing latency pressure)
For most teams: not worth it.
The hosted API approach
Hosted headless browser APIs invert the tradeoff:
curl -X POST https://pagebolt.dev/api/v1/screenshot \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Response: PNG in 2-3 seconds
# No infrastructure. No servers. Done.
Advantages of hosted
Zero infrastructure — No servers to manage. No scaling to worry about. No DevOps work.
Fast — API latency: 2–3 seconds. Consistent performance, no cold start penalty.
Reliable — 99.9% uptime SLA. Automatic failover. Managed by specialists.
Scalable — 1 request or 10,000/day — same API. No performance degradation. Auto-scaling built-in.
Cost-predictable — Per-request pricing. No surprise infrastructure bills. Scale down anytime.
Self-hosted vs hosted: Direct comparison
| Factor | Self-hosted | Hosted API |
|---|---|---|
| Setup | 2–3 hours | 10 minutes |
| Infra cost/month | $50–300 | $0 |
| DevOps time/month | 5–10 hours | 0 hours |
| Latency | 5–10s (cold start) | 2–3s |
| Scaling | Vertical (capped) | Unlimited |
| Uptime | 99% (if lucky) | 99.9% SLA |
| On-call stress | High | None |
| Per-screenshot cost | $0.05–0.20 (infra) | $0.01–0.03 (API) |
| Best for | Internal tools, high volume | User-facing, unpredictable load |
Real-world example: E-commerce screenshots
Scenario: Capture product page screenshots for every listing (500 new products/day).
Self-hosted approach
const puppeteer = require('puppeteer');
// Launch browser pool
const POOL_SIZE = 5;
const browsers = [];
async function initPool() {
for (let i = 0; i < POOL_SIZE; i++) {
browsers.push(await puppeteer.launch({
args: ['--no-sandbox', '--disable-dev-shm-usage']
}));
}
}
let currentBrowser = 0;
async function captureScreenshot(url) {
const browser = browsers[currentBrowser++ % POOL_SIZE];
try {
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
const screenshot = await page.screenshot({ format: 'jpeg', quality: 90 });
await page.close();
return screenshot;
} catch (error) {
console.error(`Failed to capture ${url}:`, error);
return null;
}
}
// Run on 5x EC2 t3.large ($0.10/hour each)
// 120 hours/month = $300/month infrastructure
// Plus: monitoring, alerting, debugging, scale planning
Cost: $300+ infrastructure + 5–10 hours DevOps = ~$1,500/month effective cost.
Hosted API approach
# Daily cron job: capture 500 screenshots
for product_id in $(curl https://api.example.com/products/new); do
curl -sX POST https://pagebolt.dev/api/v1/screenshot \
-H "x-api-key: $PAGEBOLT_API_KEY" \
-H "Content-Type: application/json" \
-d "{\"url\": \"https://store.example.com/product/$product_id\"}" \
--output "screenshots/$product_id.png"
done
# Cost: 500/day × 30 days = 15,000 requests/month → Growth plan = $79/month
# DevOps: 0 hours
# Infra: $0
# Total: $79/month (no hidden costs)
Cost: $79/month (Growth plan), $0 infrastructure, $0 DevOps = $79/month total.
Decision tree: Self-hosted or hosted?
Ask these questions:
- Volume: 1,000+ requests/day?
- Yes → Self-hosted might pay off (if you have DevOps)
- No → Hosted API is cheaper
- Predictability: Do you know your peak load?
- No → Hosted API (no scaling surprises)
- Yes → Could go either way
- Data sensitivity: Must data stay in-region?
- Yes → Self-hosted (or check API provider's data residency)
- No → Hosted API
- DevOps capacity: Do you have someone dedicated?
- No → Hosted API (essential)
- Yes → Self-hosted becomes viable
- Time-to-market: Do you need this running today?
- Yes → Hosted API (10 minutes vs 2–3 hours)
- No → Could go either way
If you answer "No" to 3+ questions above, use a hosted API.
For most teams (especially startups, small teams, unpredictable load): hosted wins.
Hybrid approach: Best of both?
Some teams try hybrid:
- Internal dashboards/tools: self-hosted Puppeteer (full control, zero latency)
- User-facing features: hosted API (reliability, scaling, no DevOps)
This works if you have 2+ distinct use cases with very different requirements. For most: overkill complexity.
Getting started with hosted
- Sign up at pagebolt.dev (free: 100 requests/month)
- Get API key from your dashboard (1 minute)
- Make your first API call (5 minutes)
- Evaluate: Does it solve your use case?
- Migrate if it does — or keep self-hosting if the numbers work out
The real question
Self-hosting isn't about "better control" or "not trusting external APIs." It's about: Can you afford 5+ hours/month of DevOps overhead plus on-call stress?
If yes: self-hosted is viable. If no: a hosted API solves the problem immediately.
Most teams say no.
Skip the infrastructure — start in 10 minutes
Free tier: 100 screenshots/month. No credit card. No servers to manage.