Guide

Drupal REST/JSON:API enumeration attempts — how to detect them

Your Drupal site is up. The status page is green. Pageviews on the dashboard look slightly elevated but nothing alarming.

1. Problem

Your Drupal site is up. The status page is green. Pageviews on the dashboard look slightly elevated but nothing alarming. Then someone notices the database is hot, the PHP-FPM pool is saturated at off-peak hours, and the CDN is reporting a steady stream of cache misses against /jsonapi/node/article and /jsonapi/user/user.

You open the access logs and the pattern is unmistakable: one IP, or a small rotating pool of IPs, walking the same endpoint with ?page[offset]=0, ?page[offset]=50, ?page[offset]=100, all the way out to offset 50,000. Every response is an HTTP 200. Every response is fully serialized JSON:API output with relationships, included resources, and field data. The bot is not breaking anything. It's reading everything.

This is a drupal jsonapi enumeration attack — and it's the failure mode that most Drupal operators don't realize is happening until the data is already gone. Search engines for "drupal rest api scraping detection" surface this exact problem because it doesn't trigger 500s, it doesn't trip the WAF, and it leaves no trace in the Drupal admin UI. The only place it shows up is in the access log and in the api.access signal stream.

2. Impact

REST and JSON:API enumeration is not a denial-of-service attack. It is a data exfiltration attack disguised as legitimate API consumption. The attacker walks every collection endpoint your site exposes — /jsonapi/node/{bundle}, /jsonapi/user/user, /jsonapi/taxonomy_term/{vocab}, /rest/views/{view_id} — paginating through the entire dataset and saving the JSON locally.

What they walk away with depends on your field-level permissions, but in a default Drupal install with JSON:API enabled it usually includes:

  • Every published node's body, summary, author, taxonomy, and metadata
  • User account listings with usernames, profile pictures, last login timestamps, and any custom fields not explicitly hidden
  • File entity URIs that point to publicly readable but un-indexed assets (/sites/default/files/private-but-not-really.pdf)
  • Taxonomy structure that reveals internal categorization, draft topic plans, and editorial vocabulary

Beyond the data theft, a sustained enumeration bot inflates infrastructure cost — JSON:API requests are uncached by default, hit the database, hydrate full entity objects, and serialize relationships. A bot pulling 50 records/second for an hour is the equivalent of a flash sale's worth of database load with zero revenue. Worse, that traffic pushes legitimate traffic out of the OPcache and entity cache, degrading performance for real users while the bot is the only "customer" being served.

And once the data is scraped, it shows up on aggregator sites, in LLM training sets, and in spear-phishing campaigns targeting your users by name and role.

3. Why It’s Hard to Spot

Drupal does not consider enumeration a failure. Every request is, technically, allowed. The permission check passes. The route resolves. The response is well-formed JSON. From Drupal's perspective, the bot is a well-behaved API client.

This is why standard monitoring misses it:

  1. No 5xx errors. Uptime checks, APM tools, and hosting dashboards only flag failures. Enumeration produces 200s.
  2. No slow queries. JSON:API queries on indexed entity tables are fast individually. It's the volume that hurts, not any single query.
  3. CDN bypass. JSON:API responses are typically marked Cache-Control: no-cache, must-revalidate because they include personalized fields, so the request goes straight through Cloudflare or CloudFront to origin.
  4. No WAF rules. Default WAF rulesets (OWASP CRS, Cloudflare Managed) flag injection patterns and known exploit signatures. They do not flag "this IP requested 1,000 sequential pages of legitimate JSON."
  5. Drupal admin UI shows nothing. The Reports page (/admin/reports/dblog) shows watchdog entries but doesn't aggregate by IP or endpoint. Even if it did, most JSON:API requests don't write to watchdog at all.

The result is an attack that completes silently, often over a weekend, and leaves no evidence in any place a Drupal admin habitually looks. By Monday the data is on a forum.

4. Cause

Drupal's REST and JSON:API modules expose entity data through a REST-conformant URL scheme. JSON:API in particular is opinionated and predictable: every content entity bundle gets a collection endpoint at /jsonapi/{entity_type}/{bundle}, every individual resource is at /jsonapi/{entity_type}/{bundle}/{uuid}, and pagination is standardized via page[offset] and page[limit] query parameters.

That predictability is the feature being abused. Once an attacker discovers JSON:API is enabled (a single GET to /jsonapi returns a self-describing index of every available resource type), they can script a walk of the entire site in under 50 lines of Python. There is no login, no token, no rate limit by default, and the response is machine-readable.

The Logystera Drupal module emits an api.access signal for every request that hits a REST or JSON:API route. The payload includes:

  • route_name — e.g. jsonapi.node--article.collection, rest.entity.user.GET
  • endpoint — the resolved path with route parameters
  • method — GET / POST / PATCH / DELETE
  • client_ip — the originating IP after reverse-proxy normalization
  • query_params — including page[offset], page[limit], filter[], include, fields[]
  • response_code — 200, 403, 404
  • bytes_sent — useful for spotting wide-include scrapers
  • authenticated — whether the request carried a valid session or token

Enumeration shows up as a tight cluster of api.access events from one IP (or a small ASN range), all on the same route_name, with page[offset] walking monotonically upward, and response_code consistently 200. That signature is unique. Legitimate frontends paginate the first two or three pages; bots paginate everything.

The supporting signals matter too. Before settling into the steady walk, attackers usually probe with http.request events showing 4xx responses on /jsonapi/node/INVALID, /jsonapi/admin/*, and other guesses. Once they find a 200, they switch into enumeration mode. Failed login attempts (auth.login_failed) on /user/login or /user/login?_format=json from the same IP are a strong correlated signal — attackers often try authenticated enumeration first because it returns more fields. And Drupal's own watchdog log will show "access denied" entries when the bot probes restricted resources, which surface as watchdog.access_denied events.

5. Solution

5.1 Diagnose (logs first)

If you suspect enumeration is happening — or want to confirm a theoretical worry — the diagnosis path is log-driven. There are three sources to look at, in order.

1. Web server access log. This is the ground truth. On a typical Drupal stack the path is /var/log/nginx/access.log or /var/log/apache2/access.log.

To find IPs walking JSON:API endpoints:

grep '/jsonapi/' /var/log/nginx/access.log \
  | awk '{print $1}' | sort | uniq -c | sort -rn | head -20

Any IP with more than a few hundred JSON:API requests in a single log file is suspicious. Legitimate frontend usage rarely exceeds 50 requests per session. Each match here corresponds to an api.access signal in Logystera.

To see which endpoints a suspect IP hit:

grep '203.0.113.42' /var/log/nginx/access.log \
  | grep '/jsonapi/' | awk '{print $7}' | sort | uniq -c | sort -rn

If you see /jsonapi/node/article?page[offset]=0, =50, =100, =150 in monotonic order, that's enumeration. That pattern produces a sequence of api.access signals with the same route_name and a walking page[offset] query parameter — the diagnostic fingerprint.

2. Drupal watchdog (dblog). Probing usually leaves access-denied entries. Query the database directly:

drush sql:query "SELECT hostname, COUNT(*) FROM watchdog \
  WHERE type='access denied' AND timestamp > UNIX_TIMESTAMP() - 86400 \
  GROUP BY hostname ORDER BY COUNT(*) DESC LIMIT 20;"

Each row here corresponds to a watchdog.access_denied signal. A single IP producing dozens of access-denied entries in 24 hours is reconnaissance, not user error.

3. PHP-FPM slow log and access log. Useful for confirming the bot is doing real work, not getting served from cache:

grep -E 'POST|GET' /var/log/php-fpm/access.log \
  | grep '/jsonapi/' | awk '{print $1, $9}' | sort | uniq -c | head

If JSON:API requests show in PHP-FPM logs with high frequency from one IP, the CDN is being bypassed and origin is doing the work. That's the cost side of the attack.

The Logystera-specific shortcut: filter api.access signals by client_ip, group by route_name, and look for any IP with more than ~500 events on a single collection route in an hour. That single query collapses the three log sources above into one view.

5.2 Root Causes

(see root causes inline in 5.3 Fix)

5.3 Fix

Enumeration has no single root cause — it's a combination of API exposure, missing rate limits, and overly permissive field access. Address each layer.

1. Disable JSON:API endpoints you don't actually use. This is the highest-leverage fix. If your frontend doesn't consume JSON:API, turn the module off entirely (drush pm:uninstall jsonapi). If you only use it for a specific bundle, install the contributed jsonapi_resources module and explicitly allowlist routes. Reduced surface area means the bot gets 404s instead of 200s — those will surface as http.request 4xx events, not api.access 200s.

2. Require authentication for JSON:API. Enable the jsonapi_extras module and set "Disable anonymous access" or use the Restrict access setting per resource. Anonymous enumeration becomes impossible. Attackers must now produce valid credentials, which surfaces as auth.login_failed clusters from the same IP — a much louder, easier-to-alert signal than steady 200s.

3. Rate-limit collection endpoints at the edge. At Cloudflare, nginx, or via the advban and flood_control modules, cap requests to /jsonapi/* at, e.g., 30 requests per minute per IP. Legitimate frontend usage stays under this; bots blow through it instantly and start producing 429 responses. In Logystera that surfaces as http.request 429 spikes from a single IP — a clear signature.

4. Restrict field-level access. Use JSON:API Extras to disable fields that shouldn't be exposed: mail, pass, init, custom PII fields. Even if enumeration succeeds, the payload value drops sharply. Audit with drush config:get jsonapi_extras.jsonapi_resource_config.user--user to see what's exposed.

5. Block known scraper IPs / ASNs. Cloudflare's "Bot Fight Mode" and ASN-level blocks are coarse but effective for cloud-hosted scrapers (DigitalOcean, Hetzner, OVH ranges are common). This won't stop residential-proxy attackers, but it filters 80% of low-effort enumeration.

6. Disable the /jsonapi index endpoint. The self-describing index gives attackers a free map of your API. Block it explicitly: location = /jsonapi { return 404; } in nginx. Now the attacker has to guess every bundle name. Guesses produce http.request 404s, which are easy to alert on at volume.

The most likely root cause in any given incident is #1 plus #2: JSON:API is on by default, anonymous, and nobody disabled it. Fix those and the rest is hardening.

5.4 Verify

Enumeration is verified-resolved only by signal absence. Watch the api.access stream for the suspect IP (or pattern):

  • The cluster of api.access events with monotonically increasing page[offset] on /jsonapi/{entity_type}/{bundle}.collection routes should stop within minutes of the fix being deployed.
  • If you added rate limiting, expect a brief burst of http.request 429 events as in-flight scrapers hit the new limit, then silence.
  • If you required authentication, expect a brief spike in auth.login_failed from the same IPs as bots try cached credentials, then silence.
  • The watchdog.access_denied signal should drop back to baseline (a handful per day from misconfigured user clients, not hundreds per hour).

A concrete check after 30 minutes:

grep '/jsonapi/' /var/log/nginx/access.log \
  | grep "$(date +%d/%b/%Y:%H)" \
  | awk '{print $1}' | sort | uniq -c | sort -rn | head -5

If no IP has more than ~20 requests in the last hour, you're back to baseline. The healthy state is: no IP appears more than a handful of times in api.access against a single collection route, no walking page[offset] sequences, and bytes_sent per IP stays in the kilobytes, not megabytes. Give it 24 hours of clean logs before declaring closed — sophisticated attackers retry from new IPs after a delay.

6. How to Catch This Early

Fixing it is straightforward once you know the cause. The hard part is knowing it happened at all.

This issue surfaces as api.access.

The hard part of REST/JSON:API enumeration isn't fixing it. It's knowing it happened.

The attack is invisible to uptime monitors, invisible to APM, invisible to the WAF, and invisible to the Drupal admin UI. By the time someone notices via a billing alert or a leaked-data report, the enumeration has been complete for weeks. Fixing JSON:API permissions after the fact does nothing — the data is gone.

The only reliable detection surface is the access log, and access logs are not something humans read. They're rotated, compressed, and forgotten. What you need is a stream that watches every API request, groups by IP and route, and fires when the shape of traffic changes from "user" to "scraper."

That's exactly what the api.access signal is for. Logystera ingests every Drupal REST/JSON:API request, derives per-IP per-endpoint volume metrics, and alerts when an IP walks a collection endpoint at machine speed — typically within a few minutes of the enumeration starting, long before the dataset is fully exfiltrated. The supporting signals (http.request 4xx probes, auth.login_failed clusters, watchdog.access_denied spikes) provide the corroborating context that distinguishes a misconfigured client from a real attacker.

This kind of failure surfaces as api.access patterns that no individual request looks suspicious — only the aggregate does. That aggregate is what Logystera computes by default.

7. Related Silent Failures

If api.access enumeration is on your radar, these neighbors usually are too:

  • Drupal user enumeration via /user/loginauth.login_failed clusters with rotating usernames from a single IP. Typically a precursor to credential-stuffing.
  • Search API endpoint scrapingapi.access spikes against /search/node or custom Search API endpoints, often used to bypass JSON:API rate limits.
  • /sites/default/files direct asset enumerationhttp.request 200s walking predictable file paths, exposing private-by-obscurity uploads.
  • Views REST export abuseapi.access against /rest/views/{view_id} returning thousands of rows in one request. Different shape, same data-exfil intent.
  • Drupal core update probinghttp.request 200s on /CHANGELOG.txt and /core/CHANGELOG.txt from scanners fingerprinting your version before exploitation.

Each of these is a quiet failure. None of them throw a 500. All of them are visible in api.access and http.request if you're watching.

See what's actually happening in your Drupal system

Connect your site. Logystera starts monitoring within minutes.

Logystera Logystera
Monitoring for WordPress and Drupal sites. Install a plugin or module to catch silent failures — cron stalls, failed emails, login attacks, PHP errors — before users report them.
Company
Copyright © 2026 Logystera. All rights reserved.