Guide

Drupal email delivery storm — when contact forms, password resets and notifications all fail at once

It is 9:14 on a Tuesday. The first ticket lands at 9:17: "I requested a password reset twenty minutes ago and never got it." By 9:25 there are six.

1. Problem

It is 9:14 on a Tuesday. The first ticket lands at 9:17: "I requested a password reset twenty minutes ago and never got it." By 9:25 there are six. By 9:40, an editor pings you on Slack — the daily content digest never went out, and the contact form on the marketing landing page is "broken." You hit the form yourself. It accepts the submission, returns the green confirmation. Nothing arrives in the support inbox. Nothing in spam. Nothing in the Mailgun activity feed for the last forty-five minutes either, where five minutes ago there were dozens.

You search for "drupal contact form not sending email" and "drupal smtp suddenly broken all sites" and every hit is from 2017 telling you to enable SMTP or check your DNS. Status report is green. Cron ran two minutes ago. Watchdog has a wall of mail channel entries that all look the same — Unable to send e-mail. Contact the site administrator if the problem persists. — but that has been a Drupal stock string since 2012 and tells you nothing about why. Meanwhile email.failed signals are landing in Logystera at a rate the platform has not seen on this entity in six months.

This is an email delivery storm: not one failed message, but a cliff. Every channel that depends on \Drupal::service('plugin.manager.mail') — Webform, Contact, password resets, Commerce order confirmations, Simplenews digests, content moderation transitions — fails inside the same five-minute window because they all share a single plumbing layer that just broke.

2. Impact

Email in Drupal is not one feature — it is a shared transport that sits under maybe a dozen modules at once. When that transport fails, the failure is fan-shaped, not point-shaped. A single broken SMTP credential takes out password resets and contact forms and commerce receipts and moderation notifications and the Simplenews queue runner — simultaneously, with no module knowing the others are also failing.

The cost is concrete and asymmetric. A Drupal Commerce site that fires no order confirmation emails for 40 minutes will see chargebacks within 72 hours — not because orders failed, but because customers think they did and reorder, or dispute. A higher-ed site running a content moderation workflow stops notifying editorial reviewers; content sits in needs_review until someone manually checks. A nonprofit running Simplenews accumulates 12,000 queue rows in queue_mail because Symfony Mailer kept rejecting and Drupal's queue worker kept retrying — when the credential is finally rotated, the storm flips from "no email" to "every recipient gets four copies of last month's newsletter."

And the ticket nobody opens is the worst one: the user who never tried again. They submitted the contact form, never got a reply, and concluded you do not respond to leads. There is no log of a customer you never knew existed.

3. Why It’s Hard to Spot

Drupal does not have one mail system. It has a dispatch chain, and every link can fail silently:

Mail manager (plugin.manager.mail) decides which mail plugin to use, based on the mailsystem.settings config and per-module overrides.
The selected plugin — php_mail (default), swiftmailer, symfony_mailer, or smtp — actually formats and sends.
The transport — local sendmail, an SMTP relay, an API like SES/Mailgun/Postmark — does the network work.
The queue — for any module that uses mail queue workers (Simplenews, mailsystem deferred sends), failures get retried until cron.php next runs and the worker times out.

A failure at any link writes the same generic watchdog message and returns false from \Drupal::service('plugin.manager.mail')->mail(). The Site Status report under /admin/reports/status checks if the mail subsystem is callable, not if it actually delivered. The Mailsystem UI shows the configuration, not the live success rate. SMTP module's "Send test email" button works against the currently saved config — not the config that was active when the storm started. Symfony Mailer 1.4+ at least surfaces transport errors in dblog, but only as a single line per failure with no rate context.

Uptime monitors do not see this. Synthetic checks do not submit forms. New Relic groups outbound SMTP under "External Services" with no per-endpoint alerting. The first signal that something is broken is almost always a customer telling you — minutes or hours after the cliff.

4. Cause

5. Solution

5.1 Diagnose

Watchdog will show a flood of identical Unable to send e-mail lines and tell you nothing. Go to the signal directly. Each query below names the signal it surfaces and correlates it with a real-world trigger window.

# Last 10 minutes of email.failed on this entity, grouped by error class
drush sql:query "SELECT timestamp, message, variables FROM watchdog \
  WHERE type='symfony_mailer' \
  AND timestamp > UNIX_TIMESTAMP(NOW() - INTERVAL 10 MINUTE) \
  ORDER BY timestamp DESC LIMIT 50;"
# → these are the watchdog rows the Logystera module reads to emit email.failed

If email.failed.storm (rule id 431) has fired, the rate has crossed 10 events in 5 minutes. That is the cliff. The next question is what changed in that window — that is where time correlation closes the case:

# Did anyone save mail-related config in the hour before the storm? (correlates with drupal_config_save_total)
drush sql:query "SELECT timestamp, message FROM watchdog \
  WHERE type='config' \
  AND (message LIKE '%mailsystem%' OR message LIKE '%symfony_mailer%' OR message LIKE '%smtp%') \
  AND timestamp > UNIX_TIMESTAMP(NOW() - INTERVAL 1 HOUR);"
# → if a config save lands at 09:11 and email.failed cliffs at 09:14, that is the trigger

# Was a mail-related module enabled or updated in the last 24h? (correlates with drupal_module_changes_total)
drush pm:list --status=enabled --filter='package=Mail' --format=table
drush sql:query "SELECT timestamp, message FROM watchdog \
  WHERE type IN ('system','update') \
  AND (message LIKE '%mailsystem%' OR message LIKE '%symfony_mailer%' OR message LIKE '%smtp%') \
  AND timestamp > UNIX_TIMESTAMP(NOW() - INTERVAL 24 HOUR);"

The pattern that closes 80% of cases: an email.failed cliff at 09:14 immediately preceded by a drupal_config_save_total spike at 09:11 on mailsystem.settings or symfony_mailer.settings.transport. Someone rotated a credential and pasted the wrong one — or upgraded Symfony Mailer 1.4 → 1.5, which changes DSN parsing.

Then check the queue and the live transport:

# Queue depth — if this is climbing while email.failed is firing, retries are amplifying the storm
drush queue:list | grep -i mail
drush sql:query "SELECT name, COUNT(*) FROM queue WHERE name LIKE '%mail%' GROUP BY name;"

# Test the live transport against the live config
drush symfony-mailer:test you@example.com
# Or, for the SMTP module:
drush php:eval "Drupal::service('plugin.manager.mail')->mail('smtp', 'test', 'you@example.com', 'en');"

Authentication failed (535) is a credential problem. Connection could not be opened is network or DNS. Sender address rejected is SPF/DKIM/DMARC misalignment.

5.2 Root causes

Each cause maps to the email.failed evidence it produces and how it appears in logs:

SMTP credential rotated or expired → email.failed with error=AuthenticationFailedException or SMTP 535. Co-occurs with a drupal_config_save_total event in the previous hour, or with no Drupal event at all (the credential expired upstream).
MX/DNS or SPF change on the sending domain → email.failed with error=550 5.7.1 ("sender rejected"). No config save — the trigger is upstream. Follows a registrar nameserver change or a DKIM key rotation.
Symfony Mailer transport DSN regression → email.failed with error=TransportException clustered immediately after a drupal_module_changes_total event for symfony_mailer.
Mailsystem swap — default mail plugin changed from php_mail to swiftmailer or vice versa at /admin/config/system/mailsystem. Surfaces as drupal_config_save_total on mailsystem.settings followed by a uniform email.failed cliff across every module.
hook_mail_alter regression — a custom or contrib module mutates $message['headers'] and produces RfcComplianceException. Often fails only on a subset of mails (with attachments, or non-ASCII subjects), so it never crosses the storm threshold but quietly drops 5–10% of traffic — exactly what email.failed.single (rule id 430) is for.
Sender reputation collapse → mixed 421 (rate-limited) and 550 (rejected), no internal trigger. IP listed, DKIM expired, complaint rate too high.
Queue worker stuck on a poison message → email.failed.storm plus climbing queue_mail depth. Worker grabs the same bad item, fails, requeues.

5.3 Fix

Roll back the last config change. If drupal_config_save_total shows a touch on mailsystem.settings or transport config in the storm window, drush config:import from the last good export, or revert in the UI. Resolves the credential-paste-error case in under two minutes.
Re-enter and re-test the SMTP credential. Use the live transport test, not the saved-config test. For SES/Mailgun, regenerate the key — do not "verify" the existing one.
Drain the queue carefully after the credential is fixed. Lower queue_mail worker concurrency before running cron, so the backlog does not flood the relay and trigger rate-limit 421s on top of recovery.
For DNS/SPF/DKIM issues, validate at mxtoolbox.com or dig +short TXT yourdomain.com, republish records, allow propagation before retesting at scale.
For hook_mail_alter regressions, bisect by disabling custom modules in dev and replaying a webform submission against the live transport.
For Symfony Mailer DSN regressions, pin the previous version with composer require drupal/symfony_mailer:1.4.0 while you reconcile the DSN format change.

5.4 Verify

The signal that should stop is email.failed.storm (rule id 431) — no new critical alert on this entity within 5 minutes of the fix.

The single-failure warning rule (id 430, email.failed.single) will not go fully silent, and that is correct. Healthy baseline: every healthy Drupal site fires 0–3 email.failed events per hour — typo'd recipient addresses, full mailboxes, occasional greylist. If you watch the dashboard for an hour after the fix and see email.failed flatten to under ~3/hour with no clustering, the storm is over. If you still see 10+/hour, the underlying cause is not resolved.

What you are looking for: the cliff shape on the email.failed panel disappears. Scattered single events are healthy. A vertical wall is not.

If email.failed is back under 3/hour but queue_mail depth is still climbing, the credential is fixed but the queue worker is jammed on a poison item — see the related queue-workers-stuck guide.

6. How to Catch This Early

Fixing it is straightforward once you know the cause. The hard part is knowing it happened at all.

This issue surfaces as email.failed. Everything you just did manually — grep watchdog, correlate config saves with the cliff, distinguish a typo'd address from a credential storm — Logystera does automatically. The same email.failed signal you queried in dblog is detected, charted, and alerted in real time, and the two-tier rule design is what makes it usable instead of noisy.

The signal in the dashboard.

The alert that fires.

The two-tier design matters because Drupal email fails in two completely different ways. The slow leak — a single bounced password reset that nobody notices for three weeks until a user calls — surfaces as the warning, and a human can chase it before it metastasizes. The cliff — a credential rotated wrong at 09:11 — surfaces as the critical, with the log line and the originating config save attached, six minutes after it starts. One signal, one piece of plumbing, both failure modes covered. The fix is simple once you know the problem. The hard part is knowing it happened at all.

7. Related Silent Failures

Drupal queue workers stuck — finding the bad queue item (queue.item_failed): when email.failed recovers but queue_mail does not drain, the bad item is the next layer down.
Drupal config import failed (drupal_config_save_total with rollback): the trigger event upstream of most credential-rotation storms.
Drupal module installed or updated (drupal_module_changes_total): catches Symfony Mailer or SMTP module upgrades that quietly change transport DSN parsing.
WordPress contact form not arriving (wp.email): the WordPress-side mirror of this failure mode for cross-platform teams.
Drupal watchdog noise filtering: the mail channel produces high-volume identical messages during a storm; filter rules keep email.failed distinct from generic watchdog floor.

See what's actually happening in your Drupal system

Connect your site. Logystera starts monitoring within minutes.

Request a demo Drupal integration