Drupal monitoring: behavior, not just uptime

Drupal sits under some of the most consequential sites on the web — governments, universities, hospitals, large publishers, national non-profits. It is also one of the hardest platforms to monitor well. Cron runs through a dedicated endpoint that can silently stop. Queues hold work that nobody notices when processing stalls. Drush commands, Views caches, entity updates, and custom modules all write to a log stream that operators rarely read. An uptime monitor will report 100% while queues are 40,000 items deep, cron has not run in 422 days, and a content migration is quietly failing every night.

Drupal monitoring, done properly, is not a ping to the homepage. It is a structured read of the events Drupal already produces — cron executions, queue activity, authentication events, watchdog entries, cache invalidations, configuration changes, entity updates, module errors — converted into metrics and rules tuned for how Drupal actually breaks. This page is the hub for that approach: what log-based monitoring captures on Drupal, the failures every site should alert on, and deep-dive guides for the most common operational issues.

What breaks on production Drupal

These are the failure modes we consistently pull out of Drupal production logs. Every one is silent by default — HTTP 200s continue, dashboards stay green, nothing surfaces in the UI until the consequences are visible.

Cron stops running

A cron key misconfiguration, a failing hook_cron implementation, or an external scheduler that quietly died. Views caches stale, search index falls behind, image derivatives never rebuild, scheduled publishing breaks.

Read: how to detect Drupal cron failures →

Queues stop processing

Queue worker crashes, a claim() timeout that never releases, a long-running item that never completes. The queue grows. Migrations back up. Notifications never send. Nobody notices until a user asks why their form submission never went through.

Read: detect stuck Drupal queues →

Watchdog fills with errors nobody reads

The dblog watchdog table grows for weeks with the same error. The pattern is obvious in the data. It is invisible to anyone who does not open the reports page that day. A well-wired monitor catches clusters within minutes.

Authentication drift

Failed login spikes, SSO/SAML failures after an IdP change, locked-out administrators, LDAP sync silently broken on a campus site. The login page still loads — that is not what you need to know.

Cache invalidations storm the origin

A content editor bulk-updates a taxonomy, a deploy invalidates every render cache, a misconfigured tag fires on every entity save. Hit ratio collapses, DB load climbs, and it is attributed to “traffic” because nobody looked at the cache signal.

Configuration drift

Config imported out of order, a module enabled on production that was never reviewed, a role gaining permissions via a maintenance script. Drupal’s config system is powerful and unforgiving; drift needs to be an alert, not an audit.

What log-based monitoring captures

Drupal is unusually well-instrumented by design — watchdog, the cache API, hook_cron, the queue API, and the event subscriber system all produce structured events. Log-based monitoring turns each of these into signals.

Signal category	What it tells you
Cron activity	Run timestamps, duration, hook_cron errors, last-run-ago. Catches cron key issues, failing modules, and dead external schedulers.
Queue processing	Items claimed, processed, released, failed, per queue. Depth over time. Stuck claim detection.
Watchdog events	dblog entries by severity, type, and module. Clusters surface as alerts, not scrollback in /admin/reports/dblog.
Authentication	Login success/failure, locked accounts, SSO/SAML errors, LDAP sync failures, permission changes.
PHP errors	Notices, warnings, fatals, deprecations — with file, line, and full stack. Not just in watchdog, but from the PHP error log directly.
Cache behavior	Hit ratio, invalidation volume, tag-specific invalidations, render cache pressure. Cache going cold is an alert, not a mystery.
Configuration changes	Module enable/disable, config import/export, permission grants, role mutations. Every change is a signal.
Entity & content events	Node/entity save, update, delete by type. Scheduled publishing execution. Migration runs.
Performance	Request latency per route, admin vs anonymous split, slow queries, memory peaks, render time.

All of these come from the same log stream. One module, one outbound connection, one batched signal feed. See log-based monitoring: the general approach for the underlying architecture.

How Logystera monitors Drupal

Logystera ships a Drupal module (Drupal 10 and 11) that hooks into the event subscriber system, watchdog, the queue API, the cache API, and the entity lifecycle. Events are batched, HMAC-signed, and shipped to the Logystera gateway. The processor derives metrics, evaluates pre-built rules tuned for Drupal operational patterns, and renders dashboards. No rules to write, no external agent, no log pipeline to run.

Dashboards are pre-built for Drupal operators: cron and queue health, watchdog clusters, authentication patterns, cache behavior, and performance breakdowns. Alert rules fire on the things that cost Drupal sites days of silent degradation — cron gaps, queue backlogs, watchdog error clusters, permission changes on production. See the Drupal integration page for the full feature list, setup steps, and screenshots.

Drupal monitoring checklist

What a well-monitored Drupal site should detect. These are operational best practices independent of any specific tool. If your current monitoring doesn’t cover these, you have blind spots large enough to lose data through.

1
Alert when Drupal cron has not run in the last 30 minutes. The single most common long-running silent failure. Detects cron-key issues, dead external schedulers, and failing hook_cron implementations.
2
Alert when a queue has more than 1,000 items or stops processing for more than 10 minutes. Covers stuck claim() calls, crashed workers, and long-running items that hold the queue.
3
Alert on watchdog error clusters. More than 20 entries of the same type in an hour, or more than 5 critical entries in 10 minutes. Pattern, not volume.
4
Alert on PHP fatal or uncaught exception clusters. Repeated fatals on the same file/line within an hour. Real regression, not transient load.
5
Alert on failed login spikes. More than 100 failures in 10 minutes site-wide, or more than 30 against a single username. Covers brute force and credential stuffing.
6
Alert on any module enable, disable, or permission change on production. These are load-bearing changes that should never happen unannounced.
7
Alert on SSO/SAML or LDAP sync failures. The quiet killer of campus and enterprise Drupal. Users stop being able to log in; an uptime monitor reports green.
8
Track cache hit ratio per bin and alert below 70%. Render cache, page cache, and discovery cache each have different failure modes and their own alert thresholds.
9
Alert on admin-route p95 latency regressions. Editors notice before visitors do. A 3× jump in node-edit latency usually precedes complaints by days.
10
Alert on migration or batch job failures. If Drupal runs migrations, feeds imports, or long-running Drush commands, their outcome needs to be a signal — not something somebody grep’s out of a log file next Tuesday.

When uptime monitoring is not enough

A Drupal site can be up by every external measurement and simultaneously failing at every internal one. Cron has not run in six weeks. The content_migration queue has 40,000 items. Watchdog shows the same TypeError 3,000 times. SSO failed for the last admin who tried. The homepage still returns 200, so the status dashboard is green, so nothing is wrong — until you open the reports page and discover the site has been broken for a month.

Log-based monitoring asks whether Drupal is doing its job, not whether it answered an HTTP request. For the full argument see uptime vs. health → — the piece is framed around WordPress but the argument is identical on Drupal.

Drupal Monitoring: How to See What Your Site Actually Does

What breaks on production Drupal

What log-based monitoring captures

How Logystera monitors Drupal

Drupal monitoring checklist

When uptime monitoring is not enough

Further reading: Drupal monitoring guides

Watch Drupal behave, not just respond.