Bot Detection

Rich Statistics uses a two-layer multi-signal scoring system to filter bots before their pageview is ever written to the database. It does this without reading any IP address and without using cookies or any persistent identifier.

How it works

Every tracked pageview goes through two independent layers that each contribute a numeric score. The scores are summed and capped at 10. If the total meets or exceeds the configured Bot Score Threshold (default: 3), the request is silently discarded.

Layer 1 — JavaScript signals (client-side)

The tracker script runs checks in the browser and combines results into a compact integer bitmask that is sent with the pageview payload. The PHP ingest endpoint never sees raw browser values — only the bitmask integer.

SignalWhat it checksBot score contributionReasoning
WEBDRIVERnavigator.webdriver === true+4Near-certain headless browser
NO_HUMAN_EVENTNo mouse, touch, or keyboard event before send+3Real users almost always interact
ZERO_SCREENscreen.width or screen.height === 0+3Impossible on a real display
CHROME_MISSING_OBJClaims Chrome UA but window.chrome absent+3Common scraper tell
NO_LANGUAGESnavigator.languages empty or missing+2Headless defaults
INSTANT_LOADNavigation timing: page loaded in < 50 ms+2Not physically possible for a real render
NO_CANVASHTMLCanvasElement missing+2Stripped by some minimal headless setups
HIDDEN_ON_ARRIVALdocument.hidden === true immediately+2Headless tabs are often hidden
NO_PLUGINSnavigator.plugins.length === 0+1Weak alone; strong combined with others
NO_TOUCH_APINo touch/pointer API AND mobile UA claim+1Mobile UA without touch events

Layer 2 — PHP signals (server-side)

The server reads only two HTTP request headers and the User-Agent string. REMOTE_ADDR (the IP address) is never read or passed to the scorer.

SignalWhat it checksBot score contribution
Honest-bot UAUA contains a known crawler string (Googlebot, Bingbot, curl, etc.)= 10 (immediate reject)
Suspicious UAUA contains headlesschrome, phantomjs, selenium, scrapy, etc.+4 per match
Short UAUA is fewer than 10 characters+3
No Accept-LanguageHTTP_ACCEPT_LANGUAGE header is absent or empty+2
No AcceptHTTP_ACCEPT header is absent or empty+1

Privacy guarantee

You can verify this yourself: grep -rn "REMOTE_ADDR" includes/ returns zero matches. The PHP scorer function signature documents that callers must pass only an allowlist of two headers — not the full $_SERVER superglobal.

Tuning the threshold

Navigate to Analytics → Data Settings → Bot Score Threshold. Lower values are more aggressive (may flag some edge-case legitimate traffic); higher values are more permissive. The default of 3 is a good starting point for most sites.

If you notice a specific traffic source being incorrectly filtered, open a GitHub issue with the User-Agent string and we'll review the signal weights.