Guide
WordPress 404 errors — separating real broken links from probe traffic and scrapers
1. Problem
You opened your access logs, or your SEO tool, or your hosting dashboard, and the 404 count looks wrong. Hundreds of them per hour. Maybe ten thousand a day on a site that gets fifty thousand pageviews. Half of them point at URLs you have never seen before — /wp-content/plugins/revslider/, /.env, /xmlrpc.php?rsd, /wp-admin/admin-ajax.php?action=duplicator_download, /old-blog-2019/some-post. Some of them are obviously real broken links — /about-us returning 404 when the page exists. Some are clearly scanners. Most are in between, and you have no idea what to fix first.
If you searched "wordpress 404 errors spike after migration" or "wordpress lots of 404 errors what to fix first", you have probably already been told to install a redirect plugin and 301 everything. That advice is wrong as a default. Redirecting every 404 sends Googlebot in circles, hides legitimate scanner traffic from your security view, and burns CPU on requests that should be dropped at the edge. The real diagnostic question is: of the top 404 URIs hitting your site, which are scanners probing for vulnerabilities, which are dead WordPress admin paths from plugins you removed years ago, and which are real broken links that lost link equity in a migration. Three different categories, three different fixes, and you cannot tell them apart without looking at the URI distribution.
This guide walks through using the wp_top_404_uris signal to do that triage in under ten minutes, then shows the three fix paths so you stop over-correcting.
2. Impact
The "redirect everything" panic move costs more than the 404s themselves.
- SEO damage from blanket redirects. Google treats a 301 as a permanent move. If you redirect
/wp-content/plugins/revslider/to your homepage, you are telling Google that a vulnerability scanner path is your homepage. It also creates redirect chains and dilutes link signals on the URLs that actually deserve a redirect. - Real broken links go untreated. When a migration moved your blog from
/blog/2019/post-titleto/post-title, every backlink to the old URL is now a 404. Each one is lost referral traffic and lost authority. These are the 404s that need redirects — but they are buried in noise from scanner probes. - Vulnerability scanners get ignored. A spike in
/wp-content/plugins/requests is a scanner enumerating which vulnerable plugin versions you run. It is the precursor to an exploitation attempt. If you 301 it to your homepage, you have hidden the signal./readme.txt - Server CPU burned on probes. Every scanner 404 still hits PHP, still loads WordPress, still runs the 404 template. On a low-traffic site this is invisible; under a coordinated probe it is 30 percent of your CPU.
- Migration debt compounds silently. A site that lost 5 percent of its referral traffic to bad post-migration URLs will not recover automatically. The lost traffic looks like a slow Google update, not a fixable bug.
3. Why It’s Hard to Spot
WordPress itself does not record 404 distribution anywhere useful. The wp-content/debug.log only captures PHP errors, not HTTP statuses. The admin dashboard shows nothing. Plugins like Redirection log 404s but truncate after a few thousand and have no built-in clustering — you get a flat list of every unique URI, which on a busy site means a scrollable wall of garbage.
Hosting dashboards show aggregate 404 counts but not the URI breakdown, or they show the breakdown only in a flat table where a / separator splits paths into bucket-friendly nonsense. SEO tools (Ahrefs, Search Console) only see what Googlebot crawled — they will never show you the scanner traffic, because scanners do not submit sitemaps to Google.
The result is that a site owner sees "10,000 404s today" and panics, but cannot answer "what URIs?" without grepping raw access logs. And raw access logs are usually 200MB per day on a moderately busy WordPress site — not searchable in real time, often rotated daily, and often not retained at all on shared hosting.
This is why 404 spikes feel mysterious. The signal exists at the edge (in nginx or Apache), but it never makes it back into the WordPress admin or any centralized dashboard.
4. Cause
The Logystera WordPress plugin emits a signal on every HTTP request the site serves. When the response status is 404, that signal feeds the wp_top_404_uris metric, which is a top-N counter keyed on the request URI (with high-cardinality tails grouped). It is not a flat 404 counter — it is a distribution. The output looks like:
wp_top_404_uris{uri="/wp-content/plugins/revslider/temp/update_extract/", entity_id="..."} 4821
wp_top_404_uris{uri="/.env", entity_id="..."} 3104
wp_top_404_uris{uri="/xmlrpc.php?rsd", entity_id="..."} 2890
wp_top_404_uris{uri="/old-blog-2019/welcome", entity_id="..."} 412
wp_top_404_uris{uri="/about-us", entity_id="..."} 87
That distribution is the diagnostic. The shape of the top-N tells you which category of 404 you are dealing with:
- Long tail of plugin/admin paths that do not exist on your site (
revslider,duplicator,wp-file-manager,.env,wp-config.php.bak) → vulnerability scanners. These are not your problem to redirect — they are your problem to block. - A handful of high-volume URIs under your old URL structure (
/blog/2019/...,/?p=123) → migration debt. These need 301s. - A scattered set of low-volume URIs that are misspelled versions of real pages, or pages that used to exist → real broken links. These need redirects or content fixes.
- Legacy admin AJAX paths (
/wp-admin/admin-ajax.php?action=duplicator_download,?action=revslider_ajax_action) with steady low volume → leftover code references from a plugin you uninstalled. Usually harmless, sometimes the trace of an old compromise.
The wp_top_error_pages signal is the same idea but keyed on the WordPress page/template that produced the error, useful for confirming whether 404s are coming from the 404 template (real misses) or from inside a plugin's REST handler (different problem). wp_bot_requests_total and wp_request_fingerprint_top close the loop by attributing requests to known bot user agents and TLS/header fingerprints, so you can see at a glance whether a given 404 URI is human or automated.
5. Solution
5.1 Diagnose (logs first)
Start with the wp_top_404_uris distribution. If you have raw access logs available, the equivalent query is:
# Top 30 URIs returning 404 in the last hour (nginx)
awk '$9 == 404 {print $7}' /var/log/nginx/access.log \
| sort | uniq -c | sort -rn | head -30
For Apache:
grep ' 404 ' /var/log/apache2/access.log \
| awk '{print $7}' | sort | uniq -c | sort -rn | head -30
That output is the raw form of wp_top_404_uris. Each line is a (count, URI) pair. Now classify each entry by pattern:
Scanner signatures. These produce the wp_top_404_uris signal with URIs matching known probe patterns. Grep for them explicitly:
grep -E ' 404 ' /var/log/nginx/access.log \
| grep -E '(\.env|\.git|wp-config|xmlrpc\.php\?rsd|/plugins/(revslider|duplicator|wp-file-manager|elementor)/|admin-ajax\.php\?action=(duplicator|revslider))' \
| awk '{print $7}' | sort | uniq -c | sort -rn
If the top result is /wp-content/plugins/revslider/temp/update_extract/ with 4,000 hits, that is a scanner sweep — there is no Revolution Slider plugin on your site (and if there is, update it now). The same URI from many source IPs in a short window is the scanner fingerprint, surfaced as wp_request_fingerprint_top clustering.
Migration debt. These are URIs that match your old URL structure. If you migrated from /blog/yyyy/mm/slug to /slug, look for the old prefix:
awk '$9 == 404 && $7 ~ /^\/blog\/20[0-9]{2}\// {print $7}' \
/var/log/nginx/access.log | sort | uniq -c | sort -rn | head -20
If you see hundreds of distinct URIs each with tens of hits, that is migration debt. The signal in Logystera is wp_top_404_uris with a stable distribution over days, not a sudden spike — links from external sites are slow and steady.
Real broken links. These come from inside your own site. Filter by referrer:
awk -v site="$(hostname)" '$9 == 404 && $11 ~ site {print $7, "<--", $11}' \
/var/log/nginx/access.log | sort | uniq -c | sort -rn | head -20
If /about-us is 404ing with referer from your homepage, you have a broken internal link. That is the smallest category by volume but the most embarrassing one — it means visitors clicking around your site are hitting dead ends.
Bot vs human attribution. Cross-reference with wp_bot_requests_total. If a URI's 404 count is dominated by requests with no User-Agent, with python-requests/, with curl/, or with the long tail of headless-browser fingerprints, it is automated. If the user-agents are real browsers, it is human. Over 95 percent of scanner 404s have an automated UA fingerprint that wp_request_fingerprint_top clusters reliably.
5.2 Root Causes
(see root causes inline in 5.3 Fix)
5.3 Fix
Stop trying to fix all 404s the same way. Match the fix to the category.
Category 1: Vulnerability scanner probes. The fix is not a redirect — it is a block. Each cause produces a specific signal pattern:
- Plugin enumeration probes (
/wp-content/plugins/) → spike in/readme.txt wp_top_404_uriswithwp-content/plugins/prefix, dominated by automatedwp_request_fingerprint_topclusters. - Config file probes (
/.env,/wp-config.php.bak) → low-cardinality, high-volume entries inwp_top_404_urisfrom many source IPs. - Backdoor probes (
admin-ajax.php?action=duplicator_download) → URI appears in bothwp_top_404_urisandwp_bot_requests_total.
Block at the edge: nginx location rules returning 444 (close connection without response), Cloudflare WAF rules matching the URI prefixes, or fail2ban watching the access log. Do not redirect. A 301 to your homepage tells the scanner the URL exists, doubles your CPU cost (one request becomes two), and pollutes your analytics.
Category 2: Migration debt. This is where 301 redirects are correct. Cause-to-signal:
- URL structure change (e.g.,
/blog/2019/title→/title) →wp_top_404_urisshows a prefix pattern with many distinct URIs. - Permalink change (
/?p=123→/post-name) →wp_top_404_urisshows query-string variants.
Use the Redirection plugin or write rules in .htaccess / nginx with a regex pattern, not one-by-one redirects. Capture the slug and rewrite it. If you have the old → new mapping in a CSV (often you do, from the migration script), import the lot.
Category 3: Real broken links. Cause-to-signal:
- Deleted page still linked internally →
wp_top_error_pagesshows the 404 template with internal referrer inwp_request_fingerprint_top. - Misspelled link in published content → low-volume entry in
wp_top_404_uriswith referrer matching one of your post URLs. - Plugin-generated URLs that no longer resolve (e.g., old WooCommerce product URLs) → distinct URI patterns under
/product/or/?p=.
Fix the link at the source. If the page should exist, restore it or create a replacement. If the link is in a post, edit the post. If it is in a theme template, fix the template. Only redirect if external sites link to the old URL and you cannot fix those.
5.4 Verify
A successful 404 cleanup is visible as a signal change, not as a single number going down.
Watch wp_top_404_uris over the next 24 hours. You are looking for three things:
- Scanner URIs (e.g.,
/wp-content/plugins/revslider/...) drop to zero or near-zero in the metric — the edge block stopped them before WordPress saw them. If they still appear, the block is in the wrong layer (PHP-level redirect plugins do not stop scanner counts; nginx-level rules do). - Migration-debt URIs drop in count and stop appearing in the top-N within a few days as caches and external referrers update. Googlebot will recrawl old URLs and follow the 301s. The migration prefix should fall out of the top 20 within a week.
- Real-broken-link URIs disappear entirely because you fixed them at the source.
Concrete grep to verify scanner blocking:
grep -E ' (404|444) ' /var/log/nginx/access.log \
| grep -E '\.env|wp-config|xmlrpc\.php\?rsd' \
| tail -100
If you see 444 status codes (or empty responses), the block is working. If you still see 404, requests are reaching WordPress and your block rule is not matching.
For migration debt, the verification is in wp_top_404_uris itself: the URIs you redirected should no longer appear in the top-N after 48 hours. If they still do, the redirect is broken or pointed at another 404.
Healthy state: wp_top_404_uris top-10 is dominated by long-tail human typos and crawler exploration of removed pages, not by repeating scanner signatures and not by your own old URL prefixes. Volume should be under 1 percent of total request volume on a typical content site.
6. How to Catch This Early
Fixing it is straightforward once you know the cause. The hard part is knowing it happened at all.
This issue surfaces as wp_top_404_uris.
The reason 404 spikes feel mysterious is that nobody is watching the URI distribution. Total 404 count is a useless number on its own — a site might have a healthy 1,000 404s a day from human typos and dead crawler traffic, and a different site might have 1,000 404s a day that are all /wp-content/plugins/wp-file-manager/... and are an active exploitation attempt. The number is the same; the meaning is opposite.
This is exactly what wp_top_404_uris solves. Instead of a flat counter, it surfaces the top URIs by volume in real time, with bot attribution from wp_bot_requests_total and fingerprint clustering from wp_request_fingerprint_top. The first time a scanner sweep starts hitting /wp-content/plugins/, it appears at the top of the distribution within minutes, not after you notice your CPU spike a day later. The first time a migration drops link equity, the affected URI prefix shows up before Search Console reports it a week later.
This category of failure does not trigger any default WordPress alert. Hosting providers do not alert on it. SEO tools see only Googlebot's view. Logs reveal it immediately, and Logystera makes the URI distribution queryable as a metric instead of an ad-hoc grep against a 200MB access log.
7. Related Silent Failures
- wp_bot_requests_total spike — coordinated bot traffic targeting login endpoints, often paired with 404 enumeration of plugin paths.
- wp_request_fingerprint_top concentration — TLS/header fingerprint dominates request volume, indicating a single tool or scanner cluster behind many IPs.
- xmlrpc.php abuse — high-volume POST traffic to
xmlrpc.phpfor credential-stuffing or pingback amplification, distinct from 404 scanner traffic. - wp.rest_api enumeration — scanners hitting
/wp-json/wp/v2/usersto harvest usernames, visible as a 200 response that should be a 401. - wp.state_change after probe — file or option modification immediately following a scanner sweep, indicating a successful exploitation attempt.
See what's actually happening in your WordPress system
Connect your site. Logystera starts monitoring within minutes.