Guide

WordPress multisite — finding the subsite causing problems for the whole network

The complaints come in waves. The marketing team says the main site is sluggish. A subsite owner says their checkout is timing out.

1. Problem

The complaints come in waves. The marketing team says the main site is sluggish. A subsite owner says their checkout is timing out. Network admin opens the dashboard and the whole network is hot — load average elevated on the application server, PHP-FPM workers pinned, MySQL connection count near the cap. New Relic, if you have it, shows a forest of slow transactions across dozens of subsites at once. There is no single offender visible in the aggregate.

You search "wordpress multisite one site slowing all" because that is exactly the shape of the failure. One subsite is doing something expensive — a runaway plugin, a hook firing on every page load, an unbounded query — and because every subsite in a multisite install shares the same PHP-FPM pool, the same MySQL server, and the same wp-content directory, the cost of that one subsite gets paid by every request to every other subsite. The network admin sees aggregate noise without knowing which subsite to blame, and wp-admin/network/sites.php shows nothing useful — just a table of names and post counts.

This guide is about how to attribute load back to a specific blog_id using per-blog signal labels, why the network admin UI cannot tell you this on its own, and how to actually isolate the noisy subsite once you find it.

2. Impact

A noisy subsite in a multisite network is not a single-site problem. It is a network-wide outage in slow motion.

Concrete impact:

Shared PHP-FPM pool means shared exhaustion. All subsites run through the same pm.max_children workers. If one subsite holds workers for 8 seconds each on a slow query, the network can exhaust the pool under modest traffic. Every other subsite gets 502s.
Shared MySQL connection budget. A subsite leaking unbuffered queries can pin connections from the shared pool. Other subsites then fail to acquire connections — visible as Error establishing a database connection on totally unrelated sites.
Shared object cache namespace risk. If the noisy plugin writes oversized transients without correct prefixes, it can evict another subsite's hot cache entries, multiplying the damage.
Customer blame is misdirected. A subsite owner whose site is suddenly slow is told "the platform is having issues" — when in reality it is one specific neighbor on the same network they cannot see.
Patch deploys feel risky. Network admins are afraid to update plugins network-wide because one subsite's unique config can crash all 50 sites at once. Fear leads to staleness leads to vulnerabilities.

The tail of this is a network admin who ssh's into the box and runs top, SHOW PROCESSLIST, and tail -f on logs hoping to catch the offender. That works once. It does not scale to 80 subsites with a neighbor that only spikes for 90 seconds at a time.

3. Why It’s Hard to Spot

Multisite is built around the fiction that all sites are equal. The codebase, the file system, the database, and the admin UI all reinforce this. There is no built-in mechanism that says "subsite 7 is consuming 60% of network resources."

Network admin dashboard shows no per-site load metrics. The Sites listing has post counts and registered date. No request rate, no error rate, no memory consumption. WordPress core does not track these.
Server-level metrics aggregate everything. top, htop, CloudWatch CPU, FPM pool stats — all of these report the union of all subsites. You see the symptom (FPM saturation) without the cause (which subsite).
PHP error logs do not include blog_id. A PHP Warning: Undefined index line in php-fpm.log tells you the file and line that produced it. It does not tell you which subsite was active when the request came in. You can sometimes infer it from the URL in the access log if you correlate timestamps, but only if the warning happens to fire at the request boundary.
Slow query logs are network-wide. MySQL slow query log shows the SQL that was slow, including the table name. In multisite, table names are prefixed by blog_id (wp_7_posts, wp_12_options), so you can sometimes guess. But for shared tables (wp_users, wp_usermeta, wp_blogs, wp_sitemeta), there is no prefix — and many slow queries hit shared tables.
Plugin updates are network-scoped. When network admin updates a plugin, it activates with the same code on all 80 subsites. But each subsite has its own settings, its own custom post types, its own active theme. The plugin can be benign on 79 subsites and catastrophic on the 80th.
Symptom appears across all subsites simultaneously. Because the pool is shared, the moment subsite 7 saturates the pool, subsites 1–6 and 8–80 all start showing latency. The customer reports look like a network-wide outage, not a single-site issue. Operators waste time looking for a network-level cause when it is local to one subsite.

4. Cause

Logystera's WordPress plugin emits a wp_environment_multisite signal as soon as it boots inside a network install. This signal is the marker that says "this entity is not a single site — the labels you see from this point forward will include blog_id." Every subsequent signal — wp_php_warnings_total, wp_buffer_dropped_total, wp_request_peak_memory_mb, wp_request_duration_ms, wp_db_queries_total — is tagged with the active blog_id at the moment the request was served.

The mechanism is straightforward. WordPress multisite uses switch_to_blog() and restore_current_blog() to scope database tables, options, and uploads to a specific subsite. The plugin reads get_current_blog_id() at the point each metric is emitted and attaches it as a label. So wp_php_warnings_total{blog_id="7"} is the warning count attributable to the seventh subsite, not the network total.

Without this label, you have one number — "PHP warnings on the network" — and that number is useless because it tells you something is wrong but not where. With the label, you can sort subsites by warning rate, by peak memory, by buffer drops, by 5xx rate, and the offender lights up immediately. The presence of wp_environment_multisite is what tells the platform to treat blog_id as a first-class dimension instead of folding it into the entity-level rollup.

Per-request peak memory is the most damning signal. wp_request_peak_memory_mb{blog_id="..."} records the high-water mark of memory_get_peak_usage() for each request. A healthy subsite sits at 32–80 MB. A subsite with a runaway plugin will show 180–250 MB consistently — which directly translates to FPM workers each holding 200+ MB of resident memory, evicting the OS file cache, and slowing every other request on the box.

5. Solution

5.1 Diagnose (logs first)

The diagnosis is a per-blog comparison. You want to ask three questions, in order: which subsite is generating the most warnings, which is using the most memory per request, and which is dropping log buffers (a proxy for "this subsite is generating so many signals that the agent cannot keep up").

1. Confirm the entity is actually a multisite install

Before anything, verify the wp_environment_multisite signal is being emitted. Without it, blog_id labels will not be present and per-blog attribution is impossible.

grep "wp_environment_multisite" /var/log/logystera-agent/agent.log | tail -5

This should show the signal with value=1 and labels including network_id, subsite_count, and domain. Absence of this signal on a known multisite install means the plugin is misconfigured — usually define('MULTISITE', true) is set but the plugin is not network-activated. Fix that first or every label below will be missing.

2. Rank subsites by PHP warning rate

grep "wp_php_warnings_total" /var/log/logystera-agent/agent.log \
  | grep -oP 'blog_id="[0-9]+"' \
  | sort | uniq -c | sort -rn | head -10

This produces wp_php_warnings_total counts grouped by blog_id. In a healthy network, the distribution should roughly track traffic — the busiest subsites have the most warnings, but the rate per request stays low. The signature of a noisy neighbor is a blog_id that is not the busiest in pageviews but generates 5–20x more warnings than its peers. That subsite is the candidate.

3. Rank subsites by request peak memory

grep "wp_request_peak_memory_mb" /var/log/logystera-agent/agent.log \
  | grep "blog_id" \
  | awk -F'value=' '{print $2}' | sort -n | tail -20

Look at the top of the list. Any line with peak memory over 150 MB is suspicious. If those high-memory requests cluster on a single blog_id, you have your offender. The metric wp_request_peak_memory_mb{blog_id="..."} is the cleanest single answer to "which subsite is heavy."

4. Check for log buffer drops

grep "wp_buffer_dropped_total" /var/log/logystera-agent/agent.log \
  | grep -oP 'blog_id="[0-9]+".*value=[0-9]+' | tail -20

wp_buffer_dropped_total increments when the agent's in-memory buffer fills faster than it can flush to the gateway. If a single blog_id accounts for most of the drops, that subsite is generating signals at a rate the agent cannot keep up with — usually because some hook is firing on every action in a tight loop. This signal alone often points directly at a specific subsite within a few minutes of the incident starting.

5. Confirm with a database query

Once you have a candidate blog_id, confirm it from MySQL:

mysql -e "SELECT blog_id, domain, path, last_updated FROM wp_blogs WHERE blog_id = 7;"

That gives you the human-readable identity of the subsite. From here you can wp --url=https://that-domain.example.com plugin list --status=active to see what is loaded, and compare it against a healthy subsite.

Each of these queries surfaces a specific signal: warning count surfaces wp_php_warnings_total{blog_id="..."}, memory surfaces wp_request_peak_memory_mb{blog_id="..."}, drops surface wp_buffer_dropped_total{blog_id="..."}. The presence of wp_environment_multisite on the entity is what guarantees those blog_id labels are populated to begin with.

5.2 Fix

Once you have the offending blog_id, the fix path depends on the root cause. There are usually three.

Cause 1: A plugin behaves badly under one subsite's specific config. Signal pattern: wp_php_warnings_total{blog_id="7"} spikes correlate with wp_state_change events on that subsite (plugin activation, settings change). Fix: deactivate plugins on the affected subsite one at a time using wp --url=https://subsite.example.com plugin deactivate , watching the warning rate drop. The first plugin whose deactivation flatlines wp_php_warnings_total{blog_id="7"} is the cause. Often it is a SEO, analytics, or related-posts plugin that builds a query against all posts on every request.

Cause 2: A subsite's database has grown beyond what shared queries can serve fast. Signal pattern: wp_request_peak_memory_mb{blog_id="7"} is sustained high, and slow query log shows queries against wp_7_postmeta or wp_7_options. Fix: clean up autoloaded options (wp --url=... option list --autoload=on --format=count), drop orphan postmeta, optimize tables. If the subsite is legitimately large, this is the moment to consider moving it to its own database.

Cause 3: A theme or plugin is using switch_to_blog() in a loop without restore_current_blog(). Signal pattern: All subsites show elevated wp_request_peak_memory_mb simultaneously, but one specific page on one specific subsite is the trigger. The aggregator-style "show recent posts from all sites" widget is the classic offender. Fix: audit the calling code, replace the loop with a single multisite-aware query against wp_blogs joined to the relevant content table, or cache the aggregate output for 5 minutes.

For lasting isolation regardless of cause, three architectural moves are available:

Separate PHP-FPM pool per subsite group. Configure FPM with multiple pools (pool-tier1, pool-tier2) and route nginx based on the request Host header. The noisy subsite saturates only its own pool; everyone else stays healthy. This is the single highest-leverage isolation move.
Separate database user per subsite tier. Enforce per-user max_user_connections so a noisy tier cannot exhaust the global connection pool.
Network-level edge caching. A noisy subsite serving heavy anonymous traffic should sit behind a CDN or full-page cache. If most traffic never reaches PHP, it cannot saturate the FPM pool no matter how badly its plugin behaves.

5.3 Verify

The signal that should change is wp_request_peak_memory_mb{blog_id=""}. After the fix, request peak memory for that blog_id should drop back into the 32–80 MB range and stay there. Watch for 30 minutes under normal traffic. If it stays flat, the immediate cause is gone.

Secondary verifications:

grep 'wp_php_warnings_total.*blog_id="7"' /var/log/logystera-agent/agent.log | tail -50

The warning rate for that blog_id should fall toward the network median. A spike that returns within an hour means the fix targeted a symptom, not the cause.

grep 'wp_buffer_dropped_total' /var/log/logystera-agent/agent.log | tail -20

Buffer drops should fall to zero. If they continue, the agent is still being overrun — go back to the warning rate per blog_id and look for a different offender.

The healthy steady state on a multisite network is: wp_environment_multisite present, warning counts roughly proportional to traffic per blog_id, no blog_id showing peak memory above 100 MB sustained, zero buffer drops in any 5-minute window. If all four of those are true for an hour after the fix, the network is back to normal.

6. How to Catch This Early

Fixing it is straightforward once you know the cause. The hard part is knowing it happened at all.

This issue surfaces as wp_environment_multisite.

A multisite outage caused by one bad subsite is the textbook silent failure. Server-level monitoring sees aggregate symptoms. WordPress core sees nothing. The network admin UI cannot show per-subsite load. By the time a human notices, every subsite owner is already complaining and the operator has to dig manually.

This kind of issue surfaces as wp_environment_multisite (the marker that per-blog labels are valid) plus wp_request_peak_memory_mb, wp_php_warnings_total, and wp_buffer_dropped_total broken down by blog_id. Logystera ranks subsites by these signals automatically and alerts when any single blog_id deviates from the network baseline by a configurable factor. The first signal you see is not "the network is slow" — it is "blog_id 7 is generating 12x the warning rate of the network median," with a direct link to that subsite's dashboard.

The detection happens before the FPM pool saturates, before the customer complaints arrive, and before the operator has to ssh into the host. That is the difference between a 90-minute outage everyone sees and a 4-minute fix nobody noticed.

7. Related Silent Failures

PHP-FPM pool exhaustion — wp_request_duration_ms p95 climbing across all subsites simultaneously is the network-wide symptom of one subsite holding workers. Find the cause via wp_request_peak_memory_mb per blog_id.
Autoloaded options bloat — wp_db_queries_total rising linearly with traffic and one subsite consistently slower indicates wp_options autoload bloat scoped to that subsite's table.
Cron pile-up across subsites — multisite runs wp-cron per subsite by default. A network with 80 subsites can fire 80 simultaneous cron sweeps. Watch wp_cron_event_total per blog_id for missed schedules.
Plugin auto-update breaking one subsite — wp_state_change correlated with wp_php_warnings_total{blog_id="..."} spike means the plugin updated on a subsite whose config it cannot handle. The other 79 subsites are fine.
Object cache namespace collision — a subsite's plugin writing oversized transients without a blog_id prefix can evict another subsite's cache. Symptom is wp_request_peak_memory_mb rising on a subsite that has not changed, while another subsite is the actual cause.

See what's actually happening in your systems

Connect your site. Logystera starts monitoring within minutes.

Request a demo See integrations