Ticket #575 (new defect)
high-time connections in handle-req impact fastcgi overload calculation
| Reported by: | moorman | Owned by: | jan |
|---|---|---|---|
| Priority: | highest | Milestone: | |
| Component: | mod_fastcgi | Version: | 1.4.19 |
| Severity: | critical | Keywords: | |
| Cc: | ts77@… | Blocked By: | |
| Need User Feedback: | no | Blocking: |
Description
This ticket is a summary of details presented to Jan via IRC on 2006-03-10.
Based on a pool of six lighttpd heads receiving traffic from a load balancer, all six heads reached a terminal overload state where they could not recover without restart. From internal statistics, fastcgi load was 100+ on each head. After restart of lighttpd on a head, once it was picked up by the load balancer, fastcgi load stabilized at ~20.
fastcgi.backend.main-php.0.connected: 205994 fastcgi.backend.main-php.0.died: 0 fastcgi.backend.main-php.0.disabled: 0 fastcgi.backend.main-php.0.load: 144 fastcgi.backend.main-php.0.overloaded: 488 fastcgi.backend.main-php.1.connected: 155287 fastcgi.backend.main-php.1.died: 0 fastcgi.backend.main-php.1.disabled: 0 fastcgi.backend.main-php.1.load: 144 fastcgi.backend.main-php.1.overloaded: 488 fastcgi.backend.main-php.load: 288
Confirmed at the load balancer that this was not a high amount of inbound traffic. lighttpd server status showed a reasonable distribution of various pages waiting in handle-req status with high values for the Time column.
338 connections hWhhhhrhhhhhhhhhWrhhhrhhhhhhhrhWrhrhhhhhhhWhhhhhhh hhhhhhhhhrhhrhhhhhhhhhhhhhhhhhhhhrhhhhhhhhhhhhrrhh rhhhhhrWrrrrhhhhhhrhhhhhhhrhhhhhrhhhhhhrhWhhhhrrhr hhrhhhhhhhhhhhhWhhhrhhhrhhrhhhrhhhWhhhhhhhhhhhrhhh hhrrhhrhhrhhhrhrrhhhhhWhhhhhhhWhrhrrrhhhrrhhhhrhhh WWrrhrrrrWrhrhWrrrrrrrhrWhrrhrrhhrhhhhrhrhhhWhrWrr hrhrhhhhhhhhrhhrhhhWhrhhhrrrrrrhhhhhhh
Approximately 150 connections shown in handle-req status have Time of 2756 or higher. Approximately 30-40 connections of this set have Time of 5000 or higher.
lighttpd error log shows continual overload status causing disable, wait, re-enable in continual cycle. Heads will not recover without restart, but head works fine after restart has occurred.
Based on discussion via IRC, as a workaround measure, plan is to add a global timeout for handle-req, such that these long-running connections in handle-req status will be shed.
-Jacob

