Bot Detection
Rich Statistics uses a two-layer multi-signal scoring system to filter bots before their pageview is ever written to the database. It does this without reading any IP address and without using cookies or any persistent identifier.
How it works
Every tracked pageview goes through two independent layers that each contribute a numeric score. The scores are summed and capped at 10. If the total meets or exceeds the configured Bot Score Threshold (default: 3), the request is silently discarded.
Layer 1 — JavaScript signals (client-side)
The tracker script runs checks in the browser and combines results into a compact integer bitmask that is sent with the pageview payload. The PHP ingest endpoint never sees raw browser values — only the bitmask integer.
| Signal | What it checks | Bot score contribution | Reasoning |
|---|---|---|---|
| WEBDRIVER | navigator.webdriver === true | +4 | Near-certain headless browser |
| NO_HUMAN_EVENT | No mouse, touch, or keyboard event before send | +3 | Real users almost always interact |
| ZERO_SCREEN | screen.width or screen.height === 0 | +3 | Impossible on a real display |
| CHROME_MISSING_OBJ | Claims Chrome UA but window.chrome absent | +3 | Common scraper tell |
| NO_LANGUAGES | navigator.languages empty or missing | +2 | Headless defaults |
| INSTANT_LOAD | Navigation timing: page loaded in < 50 ms | +2 | Not physically possible for a real render |
| NO_CANVAS | HTMLCanvasElement missing | +2 | Stripped by some minimal headless setups |
| HIDDEN_ON_ARRIVAL | document.hidden === true immediately | +2 | Headless tabs are often hidden |
| NO_PLUGINS | navigator.plugins.length === 0 | +1 | Weak alone; strong combined with others |
| NO_TOUCH_API | No touch/pointer API AND mobile UA claim | +1 | Mobile UA without touch events |
Layer 2 — PHP signals (server-side)
The server reads only two HTTP request headers and the User-Agent string. REMOTE_ADDR (the IP address) is never read or passed to the scorer.
| Signal | What it checks | Bot score contribution |
|---|---|---|
| Honest-bot UA | UA contains a known crawler string (Googlebot, Bingbot, curl, etc.) | = 10 (immediate reject) |
| Suspicious UA | UA contains headlesschrome, phantomjs, selenium, scrapy, etc. | +4 per match |
| Short UA | UA is fewer than 10 characters | +3 |
| No Accept-Language | HTTP_ACCEPT_LANGUAGE header is absent or empty | +2 |
| No Accept | HTTP_ACCEPT header is absent or empty | +1 |
Privacy guarantee
grep -rn "REMOTE_ADDR" includes/ returns zero matches. The PHP scorer function signature documents that callers must pass only an allowlist of two headers — not the full $_SERVER superglobal.
Tuning the threshold
Navigate to Analytics → Data Settings → Bot Score Threshold. Lower values are more aggressive (may flag some edge-case legitimate traffic); higher values are more permissive. The default of 3 is a good starting point for most sites.
If you notice a specific traffic source being incorrectly filtered, open a GitHub issue with the User-Agent string and we'll review the signal weights.