Stop scraping your website: practical defences for product owners and developers

Published on 2026-02-19

A practical, risk-based playbook to prevent web scraping on key pages and APIs—without turning your product into a fortress.

Stop scraping your website: practical defences for product owners and developers visual #1

Scraping is rarely “just traffic”

If you’re trying to stop scraping your website, you’re usually dealing with one of two problems: cost (bandwidth/compute/API spend) or value leakage (pricing, inventory, content, lead lists, product catalogues).

OWASP calls this family of issues automated threats to web applications—abuse of normal functionality at scale, rather than one “hack” you patch and forget (OWASP Automated Threats). That framing matters, because the fix is a system: detect, decide, respond.

What “website scraping” looks like in the real world

Scrapers aren’t always dramatic. Many are boring, consistent, and financially motivated.

Common patterns include:

Fast, repetitive requests across listing pages (category/search/results)
Systematic traversal of predictable URLs (e.g. /products/123, /jobs/456)
Direct API calls that bypass your UI entirely
Headless browsers pulling HTML and then parsing it

Cloudflare’s definition is a good baseline: web scraping is using software to extract valuable information from a site, often at a pace no human could match (Cloudflare: What is data scraping?).

Primary and secondary keywords (so this ranks, not just reads well)

Primary keywords

stop scraping website
prevent web scraping

Secondary keywords

bot protection
rate limiting
bot management
anti-scraping

The approach that actually works: Detect → Decide → Respond

Trying to block scrapers with one trick (User-Agent rules, robots.txt, a single WAF rule) usually buys you a quiet weekend, not a solution.

A scalable anti-scraping setup uses a simple loop:

Detect signals (velocity, behaviour, browser integrity, network patterns).
Decide risk per request or session.
Respond with a measured outcome (allow, slow, step-up, block).

This is also how modern bot management products describe the job: stop unwanted automation while letting useful automation through (Cloudflare: What is bot management?).

Where to focus first (hint: not your homepage)

If your goal is to stop scraping your website, protect the endpoints scrapers monetise:

Search and listing pages: /search, /category/*, /collections/*
Product detail pages (or job listings, property pages, directories)
Pricing and availability endpoints (often scraped for competitive intel)
Feeds and exports (XML/JSON endpoints that are “too convenient”)
Your APIs—especially unauthenticated ones

Pick one “high-value, high-volume” route, get it under control, then expand.

Practical controls to prevent web scraping (without rebuilding your stack)

1) Rate limit per route, not “site-wide”

Scraping is often a throughput game. If you can cap throughput on the routes being scraped, you can make the economics ugly fast.

Cloudflare’s WAF docs show practical rate limiting best practices including route-based matching and graduated penalties (Cloudflare rate limiting best practices). AWS WAF provides similar rate-based approaches you can apply at the edge (AWS WAF rate-based rules).

What to ship:

Separate limits for /search versus /product/*
Different thresholds for logged-in vs logged-out users
Rolling windows for bursts (10s–60s) and drip scraping (10–60 mins)

2) Put “step-up” gates in front of high-value data

A lot of teams jump straight to blocking. Better: reserve blocking for obvious automation, and use step-up verification when risk rises.

Examples that work well:

Allow normal browsing, but require verification when someone hits 200 product pages in 2 minutes
Allow search, but step-up when a single client enumerates the entire catalogue alphabetically
Allow API reads, but step-up when tokenless calls start behaving like a crawler

3) Make the data less scrape-friendly (selectively)

You don’t need to play whack-a-mole with your whole frontend. Aim for “high-value fields”.

Practical options:

Move certain fields behind an authenticated view (e.g. bulk pricing, supplier info)
Reduce “free enumeration” (pagination limits, tighter filters, capped sorting options)
Use caching for popular pages so scraping doesn’t translate directly into database pain

Cloudflare notes that scrapers often rely on predictable structure to parse content (Cloudflare: content scraping). You don’t need to constantly shuffle HTML, but it’s worth avoiding “perfectly structured, perfectly complete, perfectly anonymous” data access.

4) Don’t pretend User-Agent is security

Scrapers can lie. Treat User-Agent as a hint for analytics and allowlisting known “good bots”, not a lock on the door.

5) Instrument outcomes, not just “blocked requests”

To stop scraping website content reliably, you need to measure what changed.

Track:

Requests per route (before/after)
Cache hit rate and origin load
Step-up rate and pass rate
Time-to-first-byte and page performance
Business impact: copied listings, price undercutting, inventory hoarding, content reuse

A concrete example: protecting pricing and availability

Imagine an ecommerce site:

Scrapers hit /products/* and /api/price?sku=... all day.
Competitors mirror your pricing within minutes.
Your origin spend spikes at the same time.

A practical rollout to prevent web scraping:

Add route-level rate limits to /api/price and /products/*.
Allow low-volume browsing.
Step-up verify when the session crosses a velocity threshold (burst + repetition).
Block or heavily throttle repeat offenders.
Measure: time-to-match (competitors), origin CPU, and % of requests served from cache.

Optional: a “one sprint” anti-scraping checklist

Add per-route rate limits for the top 3 scraped routes
Add step-up verification for high-velocity browsing on those routes
Separate policies for logged-out vs logged-in traffic
Protect your API endpoints the same way as your UI
Add dashboards for route volume, decisions, and origin load

Where Humans Only fits

Humans Only helps you stop scraping your website with fast, privacy-first verification that keeps real users moving.

You get a clean Allow / Step-up / Block model, real-time analytics, and an easy drop-in integration—so product owners can see the trade-offs and developers can ship and tune it without guesswork.

Bottom line

If you want to stop scraping website content, don’t fight the scraper’s code. Fight the scraper’s economics: protect high-value routes, rate limit with intent, step-up when risk rises, and measure outcomes that matter.

When you’re ready to turn that into a clean, user-friendly system: Stop Bots, Welcome Humans.