Web Scraping at Scale

Modern sites render with JavaScript, lazy-load content, and push back on automation. A real browser handles all of it, and BrowserCat gives you fleets of them behind one endpoint, so you can scrape at scale without running infrastructure.

The advantage of a router shows up clearly here: different scraping jobs need different things, and you don’t have to pick a vendor for each.

The default: fast browsers on the edge

Most scraping needs a quick, clean headless Chromium and a lot of them in parallel. By default, every session is served from Cloudflare’s global edge, close to your request, with predictable spin-up. You just connect with Playwright or Puppeteer and go wide:

import {chromium} from 'playwright';

const browser = await chromium.connectOverCDP(
  'wss://api.browsercat.com/connect',
  {headers: {'Api-Key': process.env.BROWSERCAT_API_KEY}},
);
const page = await browser.newPage();
await page.goto('https://example.com/listings');
const items = await page.$$eval('.item', els => els.map(e => e.textContent));
await browser.close();

Run many of these concurrently, one browser per job, and let the platform handle scale.

When you need a proxy: bring your own

Geo-targeted pages, residential IPs, or sites that block datacenter traffic call for a proxy. Attach your own proxy to a session and the router automatically sends it to a backend that supports it (today, Steel), no code change beyond the proxy config:

See proxy configuration for the exact options. Because you bring the proxy, you keep control of your IPs and proxy spend. Managed residential proxies, plus heavier stealth and CAPTCHA handling, are on the way, see the roadmap.

Tips for scraping well

  • Pick the right region. Serve sessions close to the target site, or to match a locale. See region.
  • Be a good citizen. Respect robots.txt and rate limits; scrape only what you’re allowed to.
  • Fail fast and retry. Treat each page as fallible, set timeouts, catch errors, and retry transient failures rather than holding a browser open.
  • Close sessions promptly. You’re billed for browser time, so end a session as soon as the job is done.

Why route instead of pick

Scraping needs change job to job, and over time. A single provider locks you into one set of capabilities and one pricing page. With the router, the same /connect endpoint covers the fast common case and the proxy-heavy edge case, and gains new capabilities as backends come online, with no vendor lock-in.

machine-readable view · raw Markdown from