Guide

Drupal cache.flush running every minute — detection and root cause

1. Problem

Your Drupal site feels heavy. Pages that should be served from cache in 50 ms are taking 1.5 to 4 seconds. Editors complain that the admin is sluggish. Anonymous visitors get inconsistent performance — sometimes fast, often slow. You open the Performance page at /admin/config/development/performance, click "Clear all caches" once, and the next page load is fast. Five minutes later, it is slow again.

You check the hosting dashboard. CPU graphs are spiky but not maxed out. Uptime checks are green. New Relic, if you have it, shows elevated transaction times but no errors. Nothing is "down." You search "drupal cache rebuilding constantly" and "drupal cache flush every minute" and find a wall of forum threads from 2014 about Views and Panels.

The reality is simpler and uglier: something on your site is calling Cache::invalidateAll() or drupal_flush_all_caches() on a recurring trigger. The cache is being wiped before it ever warms up. Every request is a cold request. Your server is not slow — it is being forced to rebuild Drupal's container, plugin definitions, render caches, and Twig templates on a loop.

2. Impact

A Drupal site running without effective caching costs you in four places at once.

Latency. A warm Drupal page hits dynamic page cache and renders in tens of milliseconds. A cold rebuild hits the container builder, discovery services, plugin managers, and a full Twig compile pass. TTFB jumps from ~50 ms to 1–4 seconds. Google's CrUX report flags it. Core Web Vitals degrade. Organic traffic drops within two to three weeks.

Cost. Cold rebuilds are CPU- and database-heavy. On a managed host that bills by CPU minute or that auto-scales, a flush-loop can double or triple your monthly bill. On a fixed VM, it eats the headroom you needed for traffic spikes.

Editorial workflow. Authors saving nodes wait 5–10 seconds while the container warms. CKEditor previews stutter. Bulk operations time out.

Cascading failure. A constantly rebuilt container puts pressure on the database (cache_* tables thrash) and on opcache (paths invalidate). When traffic spikes hit a site already running cold, you get 502s from PHP-FPM running out of workers — not because of a bug in the code, but because every request now costs ten times more than it should.

The worst part: this kind of failure does not produce errors. The site stays "up." It just gets slower and more expensive every day until someone notices.

3. Why It’s Hard to Spot

Drupal does not warn you when something flushes the cache. It does not log full-site invalidations to watchdog by default. The Performance page does not show a flush history. Dblog (/admin/reports/dblog) only records what modules explicitly write — a contributed module calling drupal_flush_all_caches() silently leaves no trace.

Standard monitoring misses this for a specific reason: the site is not broken. Uptime checks return 200. Synthetic monitoring sees pages that load — slowly, but they load. APM tools show elevated transaction time but no error rate increase. The host's CPU graph looks busy but not pegged. The symptom is "the site got slower over the last week," which is the hardest pattern to triage from a graph.

Search and you'll find advice that points at the wrong layers — Varnish, Cloudflare, MySQL slow query log, opcache hit ratio. Those are downstream. The actual source is a code path inside Drupal calling the flush API on a hook or cron run, and there is no out-of-the-box panel that surfaces it.

This is exactly the silent-failure shape: real impact, no exception, no alert, no obvious place to look.

4. Cause

Drupal exposes cache invalidation through Drupal\Core\Cache\CacheTagsInvalidatorInterface and the cache.bin services. A full-site flush goes through drupal_flush_all_caches() (procedural) or Cache::invalidateAll() plus a container rebuild. The Logystera Drupal agent emits a cache.flush signal every time one of these paths fires, with a payload describing scope (all, bin:render, bin:dynamic_page_cache, tag:node:42) and the originating request or CLI invocation.

Under healthy conditions, cache.flush all should be a rare event: a deployment, a config import, a manual admin action. Tag-scoped invalidations (cache.flush bin:render tags:[node_list]) happen constantly and that is normal — they target specific cache entries, not the whole site.

The failure mode is a frequency anomaly on broad-scope flushes. A cache.flush signal with scope=all or scope=bin:render (no tags) appearing more than once or twice per hour means something is calling the wipe path on a hook, on cron, on every request, or in response to an external event.

What you observe downstream:

perf.hook_timing spikes. Cold container builds and hook_modules_installed-equivalent rebuild paths take 800–3000 ms. Each cold rebuild produces a perf.hook_timing signal with elevated duration_ms on kernel.request or boot.
http.request latency jumps for the next 1–10 requests after each flush. The dynamic page cache is empty, the render cache is empty, the entity static cache is empty. Every block, view, and field has to be rebuilt from scratch.
watchdog entries from system channel like "Cron run completed" or module-specific entries that correlate in time with each flush.

Three signals, one story: a broad invalidation is running on a tight schedule.

5. Solution

5.1 Diagnose (logs first)

Start by confirming the frequency anomaly, then trace the trigger. Every step here is tied to the cache.flush signal.

Step 1 — Confirm the loop exists

If the Logystera Drupal agent is running, the cache.flush signal is already being emitted. In Logystera, filter by event_type=cache.flush and entity=. A healthy site shows a handful of all-scope flushes per day. A looping site shows 10–60 per hour.

If you do not yet have the agent installed, you can confirm the same pattern from server logs. Drupal's settings.php sends watchdog to syslog when configured; PHP error logs also catch flush-induced warnings.

# Apache or Nginx access log: rebuild storms produce a burst of slow requests
grep -E ' (200|301) [0-9]+ "[^"]*" "[^"]*" [0-9]{4,}$' /var/log/nginx/access.log \
  | awk '{print $4}' | cut -c1-17 | sort | uniq -c | sort -rn | head

# PHP-FPM slow log: cold container rebuilds appear as repeated long traces
# in DrupalKernel::boot or ContainerBuilder::compile
grep -E "DrupalKernel::boot|ContainerBuilder::compile" /var/log/php-fpm/slow.log \
  | awk '{print $1, $2}' | sort | uniq -c | sort -rn

Each repeated DrupalKernel::boot trace within minutes of another corresponds to a cache.flush signal. The clustering pattern (every 60s, every 5 min, exactly on the minute) is the first hint about the trigger.

Step 2 — Find what is calling flush

Use Drush to inspect the cron and queue state, then grep contrib code for the offending call.

# List enabled modules with the most recent install/update
drush pm:list --status=enabled --fields=name,version,package --format=table

# Check cron history — bad cron handlers are a common cause
drush watchdog:tail --severity=Notice --type=cron

# Inspect the last cron run timestamp; flushes aligned to cron means a hook_cron is the culprit
drush state:get system.cron_last

# Search the codebase for anything that triggers full or broad invalidation
grep -rn --include='*.php' --include='*.module' \
  -e 'drupal_flush_all_caches' \
  -e 'Cache::invalidateAll' \
  -e '\\Drupal::service(.cache_tags.invalidator.).->invalidateAll' \
  -e 'CacheBackendInterface.*->deleteAll' \
  /var/www/html/web/modules/contrib /var/www/html/web/modules/custom

Every match here is a potential source of cache.flush signals with scope=all. Map each match to its hook (hook_cron, hook_node_presave, hook_user_login) — that tells you why it fires on a schedule.

Step 3 — Correlate with `perf.hook_timing` and `http.request`

In Logystera, line up the cache.flush timestamps against perf.hook_timing (filter on hook=kernel.request with duration_ms > 800) and http.request (filter on duration_ms > 1500). If the cold-rebuild bursts trail every flush by 0–10 seconds, you've confirmed the causal chain. If they don't, the flush is happening but something else is also slow.

# Same correlation from raw logs if the agent is not installed yet
# Drupal writes "Cron run completed" to dblog; cross-reference with PHP-FPM slow log
drush watchdog:show --type=cron --count=200 --extended | grep "completed"

Step 4 — Check external triggers

Some flushes come from outside Drupal: deploy scripts, CDN purge webhooks, a Jenkins job, a developer who put drush cr in a watch loop. Inspect cron jobs on the host:

# System cron and per-user cron
sudo crontab -l
crontab -l
ls -la /etc/cron.d/ /etc/cron.hourly/ /etc/cron.daily/

# Check for runaway drush
ps -ef | grep -E 'drush|php.*core/scripts' | grep -v grep

Any cron entry that runs drush cr, drush cache:rebuild, or drush cache-clear all on a short interval will produce one cache.flush signal per execution.

5.2 Root Causes

(see root causes inline in 5.3 Fix)

5.3 Fix

The fix depends on which trigger is producing the flushes. Address them in order of likelihood.

Cause A — Misbehaving `hook_cron` in a contrib or custom module

A module is calling drupal_flush_all_caches() or Cache::invalidateAll() from hook_cron. Every cron run (default: every 3 hours, but often hammered to every minute by external pings) fires a full flush.

Signal evidence: cache.flush scope=all aligned exactly with system.cron_last advances; watchdog channel cron "completed" entries trail each flush by 1–5 seconds.

Fix:

Identify the module from the grep in Step 2.
If the call is legitimate but over-broad, replace with tag-scoped invalidation: Cache::invalidateTags(['node:42']) instead of Cache::invalidateAll().
If the call is a leftover from debugging, remove it.
If the module is contrib, check its issue queue — this is a common bug pattern (search for "flush all caches in hook_cron").

Cause B — External cron pinger hitting `/cron/` too often

A monitoring service or external scheduler is hitting Drupal's cron URL every 60 seconds and a hook_cron implementation flushes caches. This is the same as Cause A but the trigger is external, not internal scheduling.

Signal evidence: cache.flush events at exact 60-second intervals; http.request entries for /cron/ matching the same cadence.

Fix:

Throttle the external pinger to every 15–60 minutes (Drupal cron does not need 1-minute granularity for most sites).
Better: disable the external pinger and use the Ultimate Cron module or a system cron entry running drush cron on the host.

Cause C — Deploy script or CI hook running `drush cr` mid-deploy and post-deploy

A deploy pipeline calls drush cache:rebuild more than once, or a post-deploy smoke test triggers it, or a CDN purge webhook calls back into Drupal and clears caches.

Signal evidence: cache.flush clusters at deploy times, then again 1–5 minutes later from the smoke test.

Fix: Audit the deploy script. One drush cr per deploy, after drush updb and drush cim, is the correct shape. Remove any duplicates.

Cause D — Null cache backend left in `settings.php`

$settings['cache']['bins']['render'] = 'cache.backend.null'; left in place from a development environment. The cache is never written. Symptom is identical to a flush loop and you'll see perf.hook_timing permanently elevated with no preceding cache.flush.

Fix: Remove the null backend override. Confirm $settings['cache']['default'] is the default backend.

Cause E — Memcache or Redis connectivity flapping

Cache backend is configured but unreliable. Each flap looks like a flush from the cache-hit-rate perspective.

Signal evidence: cache.flush scope=bin:* without corresponding scope=all; watchdog entries from redis or memcache channels with reconnect warnings.

Fix: Stabilize the backend. Check network ACLs, Redis maxmemory eviction, Memcache eviction rate.

5.4 Verify

The fix is verified when broad-scope cache.flush signals return to a low, deploy-correlated baseline and downstream latency normalizes.

Healthy signal pattern:

cache.flush with scope=all or scope=bin:render (no tags): under 5 events per 24 hours on a stable site, all aligned to known deploy or admin actions.
cache.flush with scope=tag:*: any rate is normal. Tag-scoped invalidation is the entire point of Drupal's cache system.
perf.hook_timing for kernel.request: p95 under 200 ms on warm cache, no recurring spikes above 800 ms outside deploy windows.
http.request TTFB: p95 under 300 ms for anonymous traffic on cached routes.

Verification timeframe: Watch the site for one full traffic cycle (24 hours covers cron runs, deploys, and admin activity). If no broad-scope cache.flush signal appears outside the windows you expect, the loop is closed.

# Spot-check from logs after the fix
# 1. Trigger a known-clean state
drush cr

# 2. Wait 30 minutes, then check that no unexpected rebuilds happened
grep -E "DrupalKernel::boot|ContainerBuilder::compile" /var/log/php-fpm/slow.log \
  | awk -v cutoff="$(date -d '30 min ago' '+%Y-%m-%d %H:%M')" '$1" "$2 > cutoff'

# 3. Confirm dynamic page cache is hit on anonymous requests
curl -sI https://your-site.example/ | grep -i x-drupal-dynamic-cache
# Expected: x-drupal-dynamic-cache: HIT (after first request)

If x-drupal-dynamic-cache returns HIT consistently and no new rebuild traces appear, the flush loop is gone.

6. How to Catch This Early

Fixing it is straightforward once you know the cause. The hard part is knowing it happened at all.

This issue surfaces as cache.flush.

Cache flush loops are a textbook silent failure. The site does not crash. No alert fires. The only visible symptom is "things feel slow," and that takes weeks to climb the priority list. By then you have already paid in CPU bills, lost organic traffic, and editor frustration.

Prevention is not about writing better code — every Drupal site eventually installs a contrib module that calls drupal_flush_all_caches() somewhere it shouldn't. Prevention is about detecting the frequency anomaly the moment it appears, before the symptom becomes visible.

This type of issue surfaces as the cache.flush signal, which Logystera detects and alerts on early. The cache.flush rule baseline learns your normal flush rate (typically a handful per day on a deploy-active site) and fires when broad-scope flushes exceed that baseline by more than 10x for more than 15 minutes. The supporting perf.hook_timing and http.request signals confirm the impact, so the alert ships with a built-in "this is real" verification rather than a noisy single-source warning.

The detection happens regardless of where the trigger lives — contrib module, custom hook, deploy script, external pinger. The agent does not care about the cause. It cares that broad invalidations are happening too often, and that downstream cold rebuilds are correlating. That correlation is the diagnostic insight you can't get from CPU graphs or uptime checks.

7. Related Silent Failures

A cache.flush frequency anomaly puts you next to several related Drupal-ops failure modes. They share the "silent degradation, no error" shape and surface through adjacent signals.

drupal/cron-not-running-detect — cron.run heartbeat absent for >6 hours; tasks pile up, triggering flush storms when cron finally runs.
drupal/perf-hook-timing-regression — perf.hook_timing regression on a specific hook after a module update.
drupal/dynamic-page-cache-miss-rate — http.request correlated with cache.miss on dynamic_page_cache; surfaces cache-buster query strings.
drupal/watchdog-error-burst — watchdog severity=Error rate anomaly; catches PHP notices that don't rise to fatal.
drupal/config-import-drift — cache.flush triggered by repeated drush cim runs against drifting config.

Each is a separate failure with its own primary signal. Together they form the Drupal-ops detection surface.

See what's actually happening in your Drupal system

Connect your site. Logystera starts monitoring within minutes.

Request a demo Drupal integration