Why Price Scraping Is Harder Than It Looks
Scraping a website once is easy. Running a scraper that works every day, handles site changes, and doesn't get blocked — that's the real challenge.
This post covers the practical side: tools, anti-bot measures, and how to build something you can actually run in production.
Step 1: Understand What You're Scraping
Before writing a single line of code, open DevTools (F12) → Network tab → reload the page.
Look for:
- XHR/Fetch requests — is the price loaded via an API call? If so, hit the API directly. Much easier than parsing HTML.
- Static HTML — price is in the page source when you View Source
- JavaScript-rendered — price only appears after JS runs (not in View Source)
# Quick test: if this shows the price, it's static HTML
curl -s "https://example.com/product" | grep -i "price"
If curl shows the price, you can use a simple HTTP client. If not, you need a headless browser.
The Right Tool for Each Case
| Site type | Best tool |
|---|---|
| Static HTML | Python requests + BeautifulSoup |
| REST API (JSON) | requests only |
| JavaScript-rendered | Playwright or Puppeteer |
| Heavy anti-bot | Playwright + stealth plugin + proxies |
Basic Static Scraper (Python)
import requests
from bs4 import BeautifulSoup
def get_price(url, selector):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}
r = requests.get(url, headers=headers, timeout=10)
soup = BeautifulSoup(r.text, "html.parser")
el = soup.select_one(selector)
return el.get_text(strip=True) if el else None
price = get_price("https://example.com/product-123", ".price")
print(price)
Always set a User-Agent — bare requests without one get blocked by most sites immediately.
JavaScript-Rendered Sites: Use Playwright
from playwright.sync_api import sync_playwright
def get_price_js(url, selector):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(url, wait_until="networkidle")
el = page.query_selector(selector)
price = el.inner_text() if el else None
browser.close()
return price
Playwright runs a real browser — JavaScript executes, lazy-loaded content appears, dynamic prices load.
Why Sites Block Scrapers (And How to Handle It)
Sites detect scrapers through several signals:
1. Request rate — 100 requests in 2 seconds is obviously not a human. Add delays:
import time, random
time.sleep(random.uniform(1.5, 4.0)) # random delay between requests
2. Missing browser headers — real browsers send Accept, Accept-Language, Referer, etc. Copy a full header set from DevTools.
3. Fingerprinting — sites check Canvas, WebGL, fonts, screen resolution. Playwright with the stealth plugin randomises these.
4. IP reputation — data centre IPs get flagged. Residential proxies solve this but cost money ($10–50/month).
5. CAPTCHAs — if you hit these regularly, you need a CAPTCHA solving service (2captcha, AntiCaptcha) or a smarter approach.
Running It on a Schedule
For daily price monitoring, use a cron job on a VPS:
# Run every day at 8am
0 8 * * * /usr/bin/python3 /home/user/scraper/prices.py >> /home/user/scraper/prices.log 2>&1
Or use Python's schedule library if you want it running continuously:
import schedule, time
def job():
prices = scrape_all_products()
save_to_db(prices)
check_alerts(prices)
schedule.every().day.at("08:00").do(job)
while True:
schedule.run_pending()
time.sleep(60)
Storing Results and Sending Alerts
For price monitoring, you want to track changes over time:
import sqlite3
from datetime import datetime
def save_price(product_id, price, source):
conn = sqlite3.connect("prices.db")
conn.execute(
"INSERT INTO prices (product_id, price, source, checked_at) VALUES (?, ?, ?, ?)",
(product_id, price, source, datetime.now().isoformat())
)
conn.commit()
conn.close()
def check_price_drop(product_id, threshold_pct=5):
conn = sqlite3.connect("prices.db")
rows = conn.execute(
"SELECT price FROM prices WHERE product_id = ? ORDER BY checked_at DESC LIMIT 2",
(product_id,)
).fetchall()
conn.close()
if len(rows) == 2:
current, previous = rows[0][0], rows[1][0]
drop = (previous - current) / previous * 100
if drop >= threshold_pct:
send_telegram_alert(f"Price dropped {drop:.1f}% on product {product_id}")
What I Build for Clients
Most price monitoring projects I work on have the same structure:
- Python scraper (requests or Playwright depending on the site)
- SQLite or PostgreSQL to store history
- Telegram bot for alerts when price drops or goes out of stock
- VPS cron job to run it daily
The whole thing usually takes 1–3 days to build and costs $100–250 depending on how many sites and how sophisticated the anti-bot handling needs to be.
If you need price monitoring for your business, tell me which sites you want to track and how often and I'll give you a quote.