Why Puppeteer keeps timing out in production (and what to do instead)

Your Puppeteer screenshot works locally. Takes 2 seconds. You deploy to production.

Suddenly: timeouts. Every 3rd request fails. Your error logs are full of:

TimeoutError: Waiting for navigation to "https://example.com" failed: Timeout 30000ms exceeded

Error: Browser.newPage(): target page crashed

You're not alone. This is the #1 problem with running Puppeteer in production.

Why Puppeteer times out

1. Memory exhaustion

Each Puppeteer instance holds a browser process (~150MB base + page overhead). Under load, memory fills up fast.

// This looks fine...
for (let i = 0; i < 1000; i++) {
  const page = await browser.newPage();
  await page.goto(url);
  const screenshot = await page.screenshot();
  // FORGOT TO CLOSE PAGE
  // page.close(); // ← This line missing
}

Forgot to close pages? Memory bloat. Browser slows down. Next page timeout.

2. Cold start penalty

Spawning a new browser process takes 5–15 seconds on first call:

// First request to your server
const browser = await puppeteer.launch(); // ← 8 seconds
const page = await browser.newPage();
await page.goto(url, { timeout: 30000 }); // ← Now 22 seconds already spent
const screenshot = await page.screenshot();

Your timeout is 30 seconds total. You've burned 8 seconds just starting the browser. Network hiccup? Timeout.

3. Single-page app rendering lag

Modern SPAs don't render on initial HTML. They load, fetch data, render.

await page.goto(url, {
  waitUntil: 'networkidle2' // ← Waits for network quiet
});

If the SPA has a bug and keeps fetching data, networkidle2 waits forever (or until timeout). One bad third-party API call → entire screenshot times out.

4. Resource exhaustion under concurrency

Request 1: Browser process #1 (150MB)
Request 2: Browser process #2 (150MB)
Request 3: Browser process #3 (150MB)
...
Request 10: Out of memory, killed

Now requests 1–9 fail because the OS killed the browser.

5. DNS and network issues

If DNS is slow (even 1 second extra) or the page takes 20 seconds to load, timeout. Your 30-second window evaporates faster than expected.

Why self-hosting Puppeteer is fragile at scale

You think the problem is your code. It's not. It's the architecture.

Puppeteer at scale requires:

Browser pool management (pre-spawn, recycle, health checks)
Memory monitoring (kill old processes before OOM)
Timeout handling (retry logic, fallbacks)
Load balancing (distribute across multiple instances)
Logging/debugging (understand why timeouts happen)

One EC2 instance can handle ~5–10 concurrent Puppeteer requests. Scale to 100 concurrent? You need 10–20 instances. Now you're managing Kubernetes, session affinity, auto-scaling, health checks, and cost ($500+/month). And it's still fragile.

Solution: REST API (2–3 second latency, zero complexity)

const response = await fetch('https://pagebolt.dev/api/v1/screenshot', {
  method: 'POST',
  headers: {
    'x-api-key': process.env.PAGEBOLT_API_KEY,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://example.com',
    format: 'png'
  })
});

const screenshot = await response.arrayBuffer();
res.set('Content-Type', 'image/png');
res.send(screenshot);

That's it. No memory management. No timeout handling. No infrastructure.

Real comparison: Puppeteer vs API

Puppeteer (production headache)

const puppeteer = require('puppeteer');
let browser;

async function init() {
  browser = await puppeteer.launch({
    args: ['--no-sandbox', '--disable-dev-shm-usage']
  });
}

async function takeScreenshot(url) {
  try {
    const page = await browser.newPage();
    await page.setViewport({ width: 1280, height: 720 });

    await Promise.race([
      page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 }),
      new Promise((_, reject) =>
        setTimeout(() => reject(new Error('Custom timeout')), 25000)
      )
    ]);

    const screenshot = await page.screenshot({ format: 'png' });
    await page.close();
    return screenshot;
  } catch (error) {
    console.error('Screenshot failed:', error);
    throw error;
  }
}

// Monitor memory, restart if needed
setInterval(async () => {
  const memory = process.memoryUsage();
  if (memory.heapUsed > 1e9) { // 1GB
    console.log('Memory high, restarting browser...');
    await browser.close();
    browser = await puppeteer.launch({ args: ['--no-sandbox'] });
  }
}, 60000);

100+ lines of boilerplate, manual memory management, crash recovery logic — still fragile.

REST API (5-line solution)

async function takeScreenshot(url) {
  const response = await fetch('https://pagebolt.dev/api/v1/screenshot', {
    method: 'POST',
    headers: {
      'x-api-key': process.env.PAGEBOLT_API_KEY,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url, format: 'png' })
  });

  if (!response.ok) throw new Error(`HTTP ${response.status}`);
  return await response.arrayBuffer();
}

No memory management. No crash recovery. No timeout logic. The managed service handles it.

Cost reality

Item	Puppeteer	REST API
Infrastructure	$50–200/month	$0
Scaling	10 instances = $500/month	Auto-scales, same cost
DevOps time	5–10 hours/month	0 hours
Timeout debugging	10+ hours/month	Never
Total effective cost	$1,500–2,000/month	$50–100/month

For most teams: REST API wins by 10–20x.

When to keep Puppeteer

Keep self-hosted Puppeteer if:

✅ Processing 10,000+ screenshots/day (economies of scale)
✅ Data residency requirement (EU data can't leave EU)
✅ Dedicated DevOps team already maintaining it

For everyone else: use an API.

Getting started

Sign up at pagebolt.dev (free: 100 requests/month)
Replace Puppeteer code with fetch() calls (5 minutes)
Deploy and never think about Puppeteer timeouts again

Stop debugging Puppeteer crashes at 2 AM. Let a managed service handle it.