Guide

WordPress database timeout — when no single query is slow but the request times out

A WordPress request just timed out. nginx returned 504 Gateway Timeout, the user refreshed, the page eventually loaded, and you're staring at a slow-query log that contains nothing useful.

1. Problem

A WordPress request just timed out. nginx returned 504 Gateway Timeout, the user refreshed, the page eventually loaded, and you're staring at a slow-query log that contains nothing useful.

You did the obvious thing: tail -f /var/log/mysql/slow.log with long_query_time = 1. Empty. You bumped it to 0.5. Still empty. A few log_queries_not_using_indexes rows show up but they're all under 200ms. MySQL isn't slow. EXPLAIN plans look fine. SHOW PROCESSLIST during the timeout shows nothing locked, just fast queries completing in 30–80ms each.

This is the WordPress database timeout vs slow query difference. No single query is slow. The request fires two hundred queries, each one fast, and the cumulative DB time crosses PHP's max_execution_time or the upstream proxy's timeout. The slow-query log shows nothing because there's nothing to show. The user gets a 504. You get a ticket that says "the site is sometimes slow."

This pattern surfaces as a perf.slow_queries signal with payload.total_time_ms > 5000. The slow query log is the wrong place to look. The aggregate is.

2. Impact

A WordPress site that times out on database aggregate (not on any single query) fails in a uniquely user-hostile pattern: the first page load times out, the second succeeds (object cache warmed), the third times out again under cache miss. Users see intermittent 504s, support sees no reproducible error, and your APM shows healthy P50 latency with a fat P95 tail nobody can explain.

For a WooCommerce store this is exactly the failure that kills checkout: the cart page fires wp_get_nav_menu_items, WC()->cart->get_cart(), wc_get_payment_gateways(), plus a misbehaving plugin's init hook that calls get_option() for 60 settings keys without using the alloptions cache. Each query is 8–40ms. The total is 6 seconds. The user abandons. You lose the order and you don't see anything in the slow-query log to explain why.

For a publisher running paginated archives, REST endpoints are worse: a custom plugin that paginates by re-querying the index for each post turns one logical request into 100+ round-trips. Memberships, comment counts, ACF subfields amplify it. A single slow archive page becomes a 504 cascade during traffic spikes — and the slow-query log stays empty the entire time.

Every "wordpress request total db time over 5 seconds" failure looks identical to the user (slow page) but produces zero diagnostic evidence in the standard tooling. You can't fix what you can't see.

3. Why It’s Hard to Spot

The MySQL slow-query log is built around the wrong unit. It logs per-query duration above a threshold. Every per-query duration in this scenario is below the threshold. The log is doing exactly what it's configured to do — and tells you nothing.

Standard APM tools repeat the mistake. New Relic's "slow SQL", Datadog's database query analytics, Query Monitor's "slow queries" tab — all rank queries by individual duration. A request firing 200 fast queries shows up as 200 healthy data points and one mysterious 504. There's no panel that surfaces "this request spent 5,200ms aggregate across 217 queries." That metric isn't computed by default anywhere.

WordPress itself hides the cumulative cost. $wpdb->num_queries exists but no log line is emitted at request end. Query Monitor renders it in the admin bar — only on requests that complete; a timed-out request never renders the bar. PHP's max_execution_time eventually kills the request, but the fatal says Maximum execution time of 30 seconds exceeded — not where the time went.

Uptime monitors compound the problem. They issue GET / every minute. The cached homepage responds in 200ms. The actual /cart request firing 200 uncached queries times out. Your uptime dashboard stays green while customers churn. The signal exists — it's just nowhere a human is looking.

4. Cause

A WordPress request opens a single MySQLi connection at bootstrap and reuses it for the lifecycle: wp-load.php reads options, WP_Query builds the main loop, plugins fire init, wp_loaded, template_redirect. Every $wpdb->get_results(), $wpdb->get_var(), and update_option() is a round-trip on that connection.

The Logystera WordPress plugin instruments $wpdb via the query filter and the shutdown action. On every query it accumulates query_count++ and adds the duration to a running total. On shutdown it emits a perf.slow_queries signal with payload.total_time_ms, payload.query_count, and the top-N slowest individual queries. The wp_db_timeout rule fires when event_type = perf.slow_queries AND payload.total_time_ms > 5000 — independent of per-query duration. The per-query slow_queries_find_plugin rule fires on payload.max_query_time_ms > 1000; this one fires on the aggregate.

A request can have zero queries above 100ms and still trigger perf.slow_queries with total_time_ms = 6800 if it fires 100 of them. That's what wp_db_timeout is built for.

5. Solution

5.1 Diagnose (logs first)

Confirm the request was an aggregate-time failure (not a per-query failure), find the request that did it, identify the code path that fired the storm, and time-correlate with the deploy window.

1. PHP-FPM error log — find the timeout fatals.

tail -n 1000 /var/log/php-fpm/error.log | grep -iE "Maximum execution time|gateway timeout|Allowed memory"

A Maximum execution time of 30 seconds exceeded line, with a stack frame inside wp-includes/class-wpdb.php or wp-includes/wp-db.php, is the smoking gun. PHP died inside $wpdb->query(), which means cumulative DB time blew the budget. This is what produces perf.slow_queries with the timeout marker in the Logystera agent.

2. Slow-query log — confirm it's empty (this is the diagnostic, not a problem).

# Lower the threshold aggressively and re-check during a reproduce
mysql -e "SET GLOBAL long_query_time = 0.1; SET GLOBAL slow_query_log = 'ON';"
tail -f /var/log/mysql/slow.log

If your reproduced timeout produces zero rows in the slow log — that is the diagnostic. It rules out per-query slow queries (the T2 #14 case) and confirms the aggregate-time pattern.

3. Enable SAVEQUERIES and dump $wpdb->queries from a reproduce.

// wp-config.php — temporarily, not in production
define( 'SAVEQUERIES', true );

// mu-plugins/aggregate-debug.php
add_action( 'shutdown', function () {
    global $wpdb;
    if ( ! defined( 'SAVEQUERIES' ) || ! SAVEQUERIES ) return;
    $total_ms = array_sum( array_column( $wpdb->queries, 1 ) ) * 1000;
    if ( $total_ms < 1000 ) return;
    $by_caller = [];
    foreach ( $wpdb->queries as $q ) {
        $caller = $q[2] ?? '?';
        $by_caller[ $caller ] = ($by_caller[ $caller ] ?? 0) + 1;
    }
    arsort( $by_caller );
    error_log( sprintf( '[wp-aggregate] uri=%s count=%d total_ms=%.1f top=%s',
        $_SERVER['REQUEST_URI'] ?? '-', $wpdb->num_queries, $total_ms,
        json_encode( array_slice( $by_caller, 0, 3, true ) ) ) );
}, 99999 );

One log line per slow request, with the top 3 callers — the same data the Logystera agent emits as perf.slow_queries:

grep -E "wp-aggregate.*total_ms=([5-9][0-9]{3}|[0-9]{5,})" /var/log/php-fpm/error.log
# → requests with > 5s cumulative DB time → produces perf.slow_queries

A healthy request fires 25–60 queries. A pathological one fires 200+ and one or two callers dominate. A top field like {"WP_Term_Query->get_terms, get_post_meta": 187} tells the story: a loop calling get_post_meta 187 times in a foreach ( $posts as $p ) without update_meta_cache().

4. Time-correlate with the change window.

Aggregate-time failures almost always start at a deploy boundary or a plugin update. Find when the first 5+ second request landed and what shipped in the preceding hour:

# When did total_ms first cross the threshold?
grep "wp-aggregate" /var/log/php-fpm/error.log | head -n 5

# What was deployed or updated in that window?
ls -la /var/www/wordpress/wp-content/plugins/ | sort -k 6,8
grep -iE "Plugin (activated|updated)" /var/www/wordpress/wp-content/uploads/wp-fail2ban-audit.log 2>/dev/null

That correlation is the difference between "the site is slow" and "since 14:03 UTC, immediately after the WooCommerce 8.7 → 8.8 update, every cart page fires perf.slow_queries with total_time_ms > 5000 and query_count > 180."

5.2 Root Causes

Each cause maps to a specific aggregate pattern, a signal, and where it shows up in logs.

  • N+1 query in a custom loop (most common) — a theme or plugin loops over posts/terms/users and calls get_post_meta, get_term_meta, or get_user_meta per iteration without warming the cache. Produces perf.slow_queries with payload.query_count 5–20× baseline and total_time_ms > 5000. In $wpdb->queries the same caller appears N times.
  • Plugin reading every option separately, missing alloptions — a plugin calls get_option('foo_setting_1'), ... 50–100 times in init. If keys aren't autoloaded, each one is a separate query. Produces perf.slow_queries with high query_count clustered at bootstrap. Slow-query log: nothing (each fetch < 5ms).
  • REST API paginating one row at a time — a custom endpoint or misconfigured WP_Query with posts_per_page = 1 hit in a loop. Each call fires its own request. Produces perf.slow_queries on the outer request with query_count proportional to result count. Surfaces in wp_db_query_count as a long-tail spike.
  • alloptions blob too large — autoloaded rows accumulate (transients, _site_transient_*) until alloptions is 5–20 MB. Every request unserializes the blob and fires one expensive query. Produces perf.slow_queries with query_count near baseline but total_time_ms dominated by one query — the boundary case with the per-query rule.
  • External cache (Redis/Memcached) down or slow — every get_option, get_transient, wp_cache_get falls back to DB. A 40-query request becomes a 200-query request. Produces perf.slow_queries correlated with cache.miss_rate going from < 5% to > 60%.
  • Missing primary key or index on a custom table — a plugin's table (wp_woocommerce_log, wp_yoast_indexable) lost its primary key during a migration. Each query may still be < 500ms (slow-query log misses it) but high volume amplifies. Produces perf.slow_queries with one caller dominating wp_db_query_time_ms.

5.3 Fix

Match the fix to the aggregate pattern, not to the symptom.

Cause A — N+1 loop: prime the meta cache before the loop:

$post_ids = wp_list_pluck( $posts, 'ID' );
update_meta_cache( 'post', $post_ids );
// get_post_meta() inside the loop is now cache-only, not a query.

For terms, use update_termmeta_cache. For WP_Query, set 'update_post_meta_cache' => true (default, but plugins disable it).

Cause B — option-storm: mark settings as autoload, then bust alloptions:

UPDATE wp_options SET autoload = 'yes' WHERE option_name LIKE 'foo_setting_%';
wp cache delete alloptions options

The 60 separate queries collapse into one cached read.

Cause C — REST pagination: raise per_page on the consumer, or rewrite the endpoint to fetch the page in one query with IN (...). If you don't control the consumer, add a server-side rate limit.

Cause D — alloptions bloat: audit autoloaded options:

SELECT option_name, LENGTH(option_value) AS bytes
FROM wp_options WHERE autoload = 'yes' ORDER BY bytes DESC LIMIT 20;

Anything over 1 MB that isn't a core option is suspect. Disable autoload on plugin-junk transients and stale _site_transient_* rows.

Cause E — object cache down: restore Redis/Memcached. Until it's back, the site is in a permanent N+1 state. wp redis status and redis-cli ping first.

Cause F — missing index: SHOW INDEX FROM wp_; then ALTER TABLE ... ADD PRIMARY KEY (id), ADD INDEX idx_object_id (object_id); in a maintenance window.

5.4 Verify

Two things must hold simultaneously: perf.slow_queries with total_time_ms > 5000 stops appearing, and wp_db_query_count per request returns to normal range.

# Should be empty for at least 15 minutes under normal traffic:
grep -E "wp-aggregate.*total_ms=([5-9][0-9]{3}|[0-9]{5,})" /var/log/php-fpm/error.log | tail -n 5

# Should show stable per-request query counts on three consecutive samples:
for i in 1 2 3; do
  curl -s -o /dev/null -w "total=%{time_total}s\n" https://example.com/cart/
  sleep 5
done

Healthy state in Logystera's entity view: zero perf.slow_queries with total_time_ms > 5000 for 30 minutes, wp_db_query_count P95 between 30 and 80 per request (a WooCommerce cart page is 50–80, a static blog post is 25–40), and wp_db_query_time_ms P95 under 800ms. If total_time_ms settles but wp_db_query_count is still spiking past 200, you fixed the time (faster DB, more cache) without fixing the cause (the loop is still firing N queries).

Baseline: a healthy WP site emits roughly 0–2 perf.slow_queries events per hour with total_time_ms > 5000 under normal traffic. A handful is acceptable (admin-ajax bulk operations, scheduled imports). Sustained 5+/hour means the underlying N+1 or option-storm is still in place. Don't mistake a 50% reduction for "fixed" — the goal is the floor.

6. How to Catch This Early

Fixing it is straightforward once you know the cause. The hard part is knowing it happened at all.

This issue surfaces as perf.slow_queries.

Everything you just did manually — drop SAVEQUERIES into wp-config.php, write a shutdown hook to sum query times, dump the top callers from $wpdb->queries, time-correlate with the deploy window — Logystera does automatically on every request, without you adding code. The WP plugin instruments $wpdb via the query filter and emits perf.slow_queries on shutdown with payload.total_time_ms, payload.query_count, and the top-N slow callers. The wp_db_timeout rule fires on payload.total_time_ms > 5000 — independent of the per-query slow-query rule — so this exact failure mode (no single query slow, request times out) is detected even when the slow-query log is empty.

!Logystera dashboard — perf.slow_queries over time perf.slow_queries rate, last 24h — spike at 14:03 UTC, immediately after the WooCommerce 8.7 → 8.8 plugin update.

The rule is id 318 — WordPress request DB time over budget, severity warning, threshold 3 events with payload.total_time_ms > 5000 in 5 minutes. The 3-in-5 smoothing avoids paging on a one-off 6-second admin-ajax bulk operation; sustained aggregate timeouts in user-facing requests trigger.

!Logystera alert — WordPress aggregate database timeout Warning alert fires within 5 minutes of sustained aggregate-time pattern, including query_count, total_time_ms, and the top calling code path.

The alert payload includes the URI, query_count, total_time_ms, and the top three callers from $wpdb->queries — exactly the data the manual hook produces. From the alert body alone, you can tell whether you have an N+1 loop (one caller dominates the count), an option-storm (callers cluster at bootstrap), or a cache outage. That's the difference between "the site is sometimes slow" and "since 14:03, every /cart request fires 217 queries totalling 6.4s, dominated by WC_Cart::get_cart_item_data calling get_post_meta 180 times."

The fix is simple once you know the problem. The hard part is knowing it happened at all — especially when the slow-query log, the standard tool, is the wrong instrument by design. Logystera turns this from an unreproducible support ticket into a 5-minute notification with the calling code path that proves it.

7. Related Silent Failures

  • perf.slow_queries with payload.max_query_time_ms > 1000 — the per-query case. Same signal, different rule. One slow query holding a connection. Surfaces in the slow-query log.
  • cache.miss_rate spike — when Redis/Memcached stalls, every option and meta fetch falls back to DB. Produces perf.slow_queries aggregate as a secondary effect.
  • wp.cron queue lag — long-running cron jobs hold connections and amplify aggregate-time pressure on user-facing requests. Surfaces alongside perf.slow_queries during cron windows.
  • http.request 504 — the upstream symptom. Every aggregate-time timeout produces a paired 504, but the access log doesn't tell you why. perf.slow_queries does.
  • wp.alloptions_bloat — autoloaded options exceeding a size threshold. Same family of "death by a thousand cuts" failure.

See what's actually happening in your WordPress system

Connect your site. Logystera starts monitoring within minutes.

Logystera Logystera
Monitoring for WordPress and Drupal sites. Install a plugin or module to catch silent failures — cron stalls, failed emails, login attacks, PHP errors — before users report them.
Company
Copyright © 2026 Logystera. All rights reserved.