Guide

Drupal slow queries — drilling from total count to the calling module

You already know your Drupal site has slow queries. Pages take 4–8 seconds, /admin/content hangs, cron leaks past 20 minutes, SHOW PROCESSLIST shows the same SELECT ... FROM nodefield statements stacking up.

1. Problem

You already know your Drupal site has slow queries. Pages take 4–8 seconds, /admin/content hangs, cron leaks past 20 minutes, SHOW PROCESSLIST shows the same SELECT ... FROM node__field_* statements stacking up. New Relic, Blackfire, or your own slow-query log all confirm: the database is the bottleneck. This surfaces as a drupal_slow_query_total rate climbing past your background.

What none of them tell you is which contrib module is issuing the query. The slow-query log shows the SQL. SHOW PROCESSLIST shows the SQL and the user. APM shows the request URI. None tell you the statement came from webform_submission_storage::loadByProperties() inside a hook_cron, from search_api_index rebuilding a corpus, or from commerce_order_storage doing a per-cart entity load. Drupal's database abstraction obscures the caller by design — every query goes through \Drupal\Core\Database\Connection::query(), so by the time it hits MySQL, the module identity is gone.

This is the "drupal database slow which module causing it" problem. The volume is visible, the duration is visible — the caller is the one piece your tooling won't surface, and it's the only piece that turns "slow database" into a one-line patch in composer.json.

The Logystera Drupal agent captures the calling class on every slow statement and emits three correlated signals: drupal_slow_query_total (count), drupal_slow_queries_by_caller (broken down by class), and drupal_slow_query_duration_ms (histogram per caller). Drilling those three takes you from "queries are slow" to "uninstall this module" in under five minutes.

2. Impact

A Drupal site running with un-attributed slow queries pays in three places at once.

User-visible latency. Drupal's default page weight is 30–80 queries per anonymous request, 200+ for authenticated /admin. A single 800 ms query buried in an entity load — fired by a contrib module the site builder forgot was installed — pushes TTFB over 3 seconds on every page hitting that code path. For a Commerce checkout, that's a multi-second penalty per click in the conversion funnel. For a paywalled D10 publisher, the difference between a 1.5s and a 5s above-the-fold render.

Cron leakage. hook_cron runs are unbounded by default. If webform_submission_storage::loadByProperties() takes 3 seconds per call and webform cron iterates 5,000 stale submissions, you've burned 4 hours — and cron.php runs sequentially, so search_api_index, update_manager, and your custom cron hooks are blocked behind it. The visible failure is drupal_cron_duration_ms exceeding 1,200,000 (20 min lock timeout) and Drupal force-aborting. The root cause: one slow query, attributed to one module, silently degrading for weeks.

Connection pool starvation. Slow queries hold their MySQL slot until they finish. A caller firing 50 slow statements per page at peak pushes Threads_connected toward max_connections; the next request gets Too many connections — a db.connection_failed cascade triggered not by infrastructure failure but by one module monopolizing the pool.

The cost of not knowing the caller: site builders disable modules at random until the site speeds up. A multi-hour session that should have been a five-minute drilldown.

3. Why It’s Hard to Spot

Drupal's database layer abstracts away the caller. Every entity query, every Views render, every \Drupal::database()->select() ends up inside the same Connection::query() method. From MySQL's perspective, every query is issued by [email protected] — the module identity is lost two stack frames into the abstraction.

Standard tooling fails at attribution:

MySQL slow-query log records SQL, duration, user — but not what PHP produced the SQL. Identical SELECT clauses from webform, commerce, or a custom module are indistinguishable.
APM tools (New Relic, Blackfire, Tideways) correlate slow SQL with a request URI; cron and Drush have no URI, and /node/123 doesn't tell you which of 40 modules on that page issued the slow statement.
SHOW PROCESSLIST is poll-based: a 200 ms query running 100 times/second is invisible but ruinous.
dblog logs errors, not slow successes, and never records query origins.

The result is a silent failure: the log is full of evidence, and no standard tooling tells you which module owns the query. Site builders disable modules one at a time waiting for the noise to drop — a multi-hour ritual for a five-minute fact.

4. Cause

When a Drupal query exceeds the threshold (Logystera's agent defaults to 500 ms, configurable), the agent's SlowQueryListener — registered against the database event dispatcher — captures the duration, the query (truncated and parameter-stripped to avoid PII leaks), and the calling class, walked from the PHP backtrace by skipping every frame inside \Drupal\Core\Database\*, \Drupal\Core\Entity\Sql\SqlContentEntityStorage, and the Symfony query stack until it lands on the first userland frame. That frame's class is the caller.

The agent emits three signals from this single capture:

drupal_slow_query_total — counter, incremented on every query above threshold. Labels: entity, environment, route_group.
drupal_slow_queries_by_caller — counter with a caller label populated from the backtrace (e.g. webform_submission_storage, search_api_index, commerce_order_storage, views_plugin_query_sql). Labels: entity, caller, environment.
drupal_slow_query_duration_ms — histogram of per-call duration, also labeled by caller. Separates "100 quick-ish slow queries" from "5 catastrophic ones" — same _total count, different fixes.

The drill is a three-step funnel: drupal_slow_query_total confirms the problem and trend, drupal_slow_queries_by_caller attributes the volume to a specific class, and drupal_slow_query_duration_ms (filtered to that caller) tells you whether it's many medium-slow queries (caching opportunity) or a few catastrophic ones (algorithmic bug). The caller label is load-bearing — it turns drupal_slow_query_total from a vanity metric into an attribution chain.

5. Solution

5.1 Diagnose (logs first)

A deliberate three-step drill: confirm volume, attribute to a caller, characterize per-call cost. Skip a step and you'll fix the wrong module.

1. Confirm the volume — drupal_slow_query_total.

# Logystera agent writes structured slow-query records to a dedicated channel.
tail -n 1000 /var/log/php-fpm/drupal.log | grep '"event":"slow_query"' | wc -l

# MySQL's slow-query log catches the same events (without the caller):
grep -c "Query_time" /var/log/mysql/mysql-slow.log

Baseline: 0–20 slow queries/hour. >100/hour starts the funnel. This is drupal_slow_query_total in the dashboard.

2. Attribute the volume — drupal_slow_queries_by_caller.

The load-bearing step. Aggregate by caller:

tail -n 5000 /var/log/php-fpm/drupal.log \
  | grep '"event":"slow_query"' \
  | jq -r '.caller' \
  | sort | uniq -c | sort -rn | head -10

Typical incident output:

   847 webform_submission_storage
   612 search_api_index
   311 commerce_order_storage
    89 views_plugin_query_sql
    42 path_alias_repository

webform_submission_storage owning 847 of the last 5,000 is your answer. This is drupal_slow_queries_by_caller in Logystera, broken down by caller.

3. Characterize per-call cost — drupal_slow_query_duration_ms.

Volume alone is misleading: 800 calls at 600 ms is a different fix than 80 calls at 8000 ms. Pull the duration histogram for the dominant caller:

tail -n 5000 /var/log/php-fpm/drupal.log \
  | grep '"event":"slow_query"' \
  | jq -r 'select(.caller=="webform_submission_storage") | .duration_ms' \
  | awk '{ if ($1<1000) b1++; else if ($1<5000) b2++; else if ($1<10000) b3++; else b4++ }
    END { print "<1s:",b1," 1-5s:",b2," 5-10s:",b3," 10s+:",b4 }'

Fat 1–5s bucket → caching/indexing fix. Fat 10s+ bucket → algorithmic fix (the query itself is wrong).

4. Time-correlate with the most recent change window.

Slow-query attribution clusters around real-world events: a contrib update that changed an entity query, a content migration that bloated a table, a cron schedule that began running a previously-disabled hook. Find the change.

# What changed in /modules/contrib in the last 7 days?
find /var/www/drupal/web/modules/contrib -name "*.info.yml" -mtime -7 -print

# When did slow-query volume cross threshold?
grep '"event":"slow_query"' /var/log/php-fpm/drupal.log | head -n 1 | jq -r '.timestamp'

# Cross-reference with composer.lock mtime and recent deploy:
stat -c '%y %n' /var/www/drupal/composer.lock
journalctl --since "24 hours ago" -u php8.3-fpm | grep -iE "slow|deploy|cache rebuild"

If the first webform_submission_storage event lines up with the mtime on composer.lock, the upgrade introduced the regression. That correlation — "drupal_slow_query_total spiked at 14:03 UTC, attributed via drupal_slow_queries_by_caller to webform_submission_storage, immediately after the webform 6.2.4 → 6.2.5 update" — is the story the three signals tell.

5.2 Root Causes

Each cause maps to a specific caller pattern and signal. Prioritized by frequency.

Contrib module shipped a regressed entity query — most common. A version bump on webform, search_api, or commerce introduces a query missing an index hint or drops EntityFieldQuery::range(). Surfaces as a spike in drupal_slow_query_total, attributed in drupal_slow_queries_by_caller to a single class, drupal_slow_query_duration_ms concentrated at 1–5s.
Cron iteration over a grown table — fine on 1,000 rows, catastrophic on 500,000. webform_cron and search_api_index are repeat offenders. Surfaces as drupal_slow_queries_by_caller{caller="..."} rising with table size, paired with drupal_cron_duration_ms past 600,000 (10 min).
Misbehaving Views block on a high-traffic page — a Views query with an unindexed node__field_* join, fired on every block render. Surfaces as drupal_slow_queries_by_caller{caller="views_plugin_query_sql"} correlated with drupal_http_requests_total — when traffic doubles, slow-query rate doubles.
Custom module bypassing entity cache — a _my_module_get_nodes() doing \Drupal::database()->select('node_field_data') directly. Surfaces as a caller label with the custom class, duration histogram fat in the 500ms–1s bucket.
path_alias resolution at scale — PathAliasRepository::lookupByAlias() becomes pathological once path_alias exceeds ~200k rows without the right indexes. Surfaces as drupal_slow_queries_by_caller{caller="path_alias_repository"} on every request.
Search API indexing during traffic peaks — search_api_index running its tracker queries during business hours, contending with live traffic. Surfaces as drupal_slow_queries_by_caller{caller="search_api_index"} and drupal_cron_duration_ms exceeding the run window.

5.3 Fix

Match the fix to the caller the drilldown identified, not to the SQL the slow-query log shows.

Caller A — contrib regression (e.g. webform_submission_storage): check the issue queue for the version delta, pin to the previous release while the upstream patch lands.

composer require 'drupal/webform:6.2.4' --update-with-dependencies
drush updatedb -y && drush cr

Caller B — cron iteration on a grown table: add a per-run cap and explicit ordering, or move the work to a queue worker that respects cron's lock.

$query = \Drupal::entityQuery('webform_submission')
  ->condition('completed', strtotime('-30 days'), '<')
  ->sort('completed', 'ASC')
  ->range(0, 500); // hard cap per cron tick

Caller C — misbehaving Views: find the view, then add the missing index or convert the block to a views_pre_render cache. EXPLAIN the SQL directly.

EXPLAIN SELECT n.nid FROM node_field_data n
  LEFT JOIN node__field_category c ON n.nid = c.entity_id
  WHERE c.field_category_target_id = 42 ORDER BY n.created DESC LIMIT 10;
-- "Using filesort; Using temporary" → add the missing index:
ALTER TABLE node__field_category ADD INDEX idx_target_entity (field_category_target_id, entity_id);

Caller D — custom module bypassing entity cache: rewrite to use the entity API so the call is attributed to EntityTypeManager and hits cache_entity.

$nids = \Drupal::entityQuery('node')->accessCheck(FALSE)->execute();
$nodes = \Drupal::entityTypeManager()->getStorage('node')->loadMultiple($nids);

Caller E — path_alias_repository: confirm core's expected indexes (idx_path_alias_alias, idx_path_alias_path) exist, then prune disabled aliases (DELETE FROM path_alias WHERE status = 0;).

Caller F — search_api_index overlapping traffic: move indexing off the cron path. Use drush search-api:index from a dedicated worker, or set cron_limit: 0 and run via systemd timer at 03:00 UTC.

5.4 Verify

You're verifying three correlated signals return to baseline simultaneously: drupal_slow_query_total overall volume drops, drupal_slow_queries_by_caller{caller=""} stops appearing, and drupal_slow_query_duration_ms returns to its tight, low-duration shape.

# Run for 15 minutes after deploy.
watch -n 30 'tail -n 5000 /var/log/php-fpm/drupal.log \
  | grep "\"event\":\"slow_query\"" | wc -l'

tail -n 5000 /var/log/php-fpm/drupal.log \
  | grep '"event":"slow_query"' \
  | jq -r '.caller' | sort | uniq -c | sort -rn | head -5

Healthy baseline:

drupal_slow_query_total: 0–20 events per hour under normal traffic. >100/hour for 15 minutes is anomalous.
drupal_slow_queries_by_caller: distribution should be flat — no single caller owning more than ~30% of volume. A spike in one caller label is the failure mode.
drupal_slow_query_duration_ms: P50 around 600 ms (just over threshold), P95 under 1500 ms. P95 above 5000 ms means a caller is firing catastrophic queries.

The baseline matters: a site with healthy drupal_slow_query_total (~10/hour) but a heavily skewed drupal_slow_queries_by_caller (one module owning 90% of those 10) is still the failure mode. If the patched caller drops to zero but drupal_slow_query_total only falls 30%, you fixed one module out of several. Re-run step 2 on the residual volume.

6. How to Catch This Early

Fixing it is straightforward once you know the cause. The hard part is knowing it happened at all.

This issue surfaces as drupal_slow_query_total.

Everything you just did manually — count slow events, group by caller, characterize per-call duration, time-correlate with the contrib update — Logystera does automatically. The Drupal agent's SlowQueryListener walks the PHP backtrace on every slow statement, attaches the calling class as a caller label, and emits the three correlated signals to the gateway in real time. The dashboard is the drill made visual.

!Logystera dashboard — drupal_slow_query_total over time drupal_slow_query_total rate, last 24h — spike at 14:03 UTC, immediately after the webform 6.2.5 contrib update, caller panel resolving to webform_submission_storage.

Click the spike on the volume panel and the entity view drills into drupal_slow_queries_by_caller, ranking callers for the same window. Click the dominant caller and the third panel pivots to drupal_slow_query_duration_ms filtered to that caller, showing whether it's "many medium-slow" (caching fix) or "a few catastrophic" (algorithmic fix). Full drilldown in about 15 seconds.

The rule that fires is id 514 — Drupal slow query attribution, severity warning, threshold >50 events with a single caller exceeding 60% of volume in 5 minutes. The 60% concentration test is the load-bearing condition: it distinguishes "busy site with normal slow-query noise" from "one module owning the slow path." A flat distribution won't fire; a single caller dominating will.

!Logystera alert — Drupal slow query attribution Warning alert fires when one caller exceeds 60% of drupal_slow_query_total volume for 5 minutes — payload includes the caller class and P95 duration from drupal_slow_query_duration_ms.

The alert payload includes timestamp, affected entity, dominant caller label, count over the window, and P95 duration from the histogram. That's enough to decide which root cause in §5.2 applies and which fix in §5.3 to run — from the alert body alone.

The fix is simple once you know which module is responsible. The hard part is knowing which module is responsible at all. Logystera turns this from a multi-hour module-disabling ritual into a 60-second alert that names the contrib module by class.

7. Related Silent Failures

drupal_cron_duration_ms exceeding the lock window — same caller chain, cron leakage instead of per-page latency. A slow caller extends drupal_cron_duration_ms past the 1,200,000 ms (20 min) lock and aborts subsequent cron hooks.
drupal_db_connection_pool_exhausted — slow queries holding MySQL slots starve max_connections. Often precedes db.connection_failed during traffic spikes; the caller label identifies the module monopolizing the pool.
drupal_http_requests_total 5xx correlated with slow-query spikes — when slow queries push request duration past PHP-FPM's request_terminate_timeout, the request is killed mid-query and surfaces as a 504. The caller drilldown ties the 5xx back to a module.
drupal_entity_cache_miss_rate — high miss rates on cache_entity amplify slow-query volume. A caller that should hit cache but isn't shows up in drupal_slow_queries_by_caller even when the underlying SQL is fast individually.
drupal_views_render_duration_ms — Views-specific render latency. When drupal_slow_queries_by_caller{caller="views_plugin_query_sql"} dominates, this companion signal identifies which display.

See what's actually happening in your Drupal system

Connect your site. Logystera starts monitoring within minutes.

Request a demo Drupal integration