Project

General

Profile

Actions

Feature #1488

closed

PHP does not use FCGI_OVERLOADED, suggest to use timeout to detect overload

Added by Anonymous over 16 years ago. Updated almost 8 years ago.

Status:
Obsolete
Priority:
Normal
Category:
mod_fastcgi
Target version:
ASK QUESTIONS IN Forums:

Description

as stated here:

http://bugs.php.net/bug.php?id=39809

"PHP cannot return FCGI_OVERLOADED". so the fastcgi connection just times out during heavy overloading. Right now lighty's mod_fastcgi gets stuck in PROC_STATE_DIED_WAIT_FOR_PID. but of course the php parent never dies, because it is not shutting down, just very busy.

i have attached a proposed patch idea which shows how to interpret a PROC_STATE_DIED_WAIT_FOR_PID the same as PROC_STATE_OVERLOADED. (just a simple code copy).

For this to become a proper solution it would have to be triggered by a mod_fastcgi config, eg:

"treat-timeout-as-overload" = 1

which would be set for PHP configs when running with bin-path set (ie for local php processes which are the children of lighty).

There is a workaround, which is to use spawn_fcgi to launch php and use lighty config without bin-path. In this case, because lighty has no other choice, it just waits 5 seconds in case of timeout and tries again to send more requests. This behaviour is of course better given PHP's slight misbehaviour in not sending FCGI_OVERLOADED.

see also http://bugs.php.net/bug.php?id=43610 where I am re-asking whether is would be possible for PHP to adhere to the fastcgi more stringently with regard to FCGI_OVERLOADED.

-- oliver


Files

mod_fastcgi.c.diff (1.75 KB) mod_fastcgi.c.diff patch to treat timeout as overloaded -- oliver Anonymous, 2007-12-17 12:52
Actions #1

Updated by oschonrock over 16 years ago

php version which we used for testing is 5.2.5 (important because quite a bit has changed in 5.2.x)

it also has had this patch applied to it:

http://cvs.php.net/viewvc.cgi/php-src/main/SAPI.c?r1=1.202.2.7.2.15&r2=1.202.2.7.2.16&pathrev=PHP_5_2&diff_format=u

to eliminate a bug in 5.2.5.

refer here for detailed setup:

http://bugs.php.net/bug.php?id=43610

Actions #2

Updated by admin over 16 years ago

how to interpret a PROC_STATE_DIED_WAIT_FOR_PID the same as PROC_STATE_OVERLOADED.

I think it's a better idea to close the connection to the FCGI backend when it times out and not go into PROC_STATE_DIED_WAIT_FOR_PID.
If it really died, a connect() should refuse the connection and then you can go into PROC_STATE_DIED_WAIT_FOR_PID.

I think Lighttpd should be limited to open just PHP_FCGI_CHILDREN as PHP is not able to handle more connections.

Actions #3

Updated by oschonrock over 16 years ago

@Olaf van der Spek

I think your suggestion is a good alternative to the patch I suggested. ie working around php's misbehaviour in an essentially similar way.

However I am still not convinced that php can't return FCGI_OVERLOADED when its request queue (over and above the number of already busy workers) grows above a configurable limit, see my reply to you here:

http://bugs.php.net/bug.php?id=43610

Actions #4

Updated by moo over 16 years ago

php cannot return FCGI_OVERLOADED because it's prefork, and staticly not dynamic forked, it won't grow. there's no "request queue" coded in php but os tcp listen backlog which php set to 128

timeout/busy issue should be fixed.
we should analyze the most if not all the cases that might happend even though we might make a simple solution at last, "Instead of", find a way that seems to fix this issue but might cause other new problem or leave other old problem not discovered/fixed

i haven't look deep into this problem yet, but imho, the solution should be defined as "convert timeout to busy" but "queue full server busy", and/or "too old (timeout) requests that queued drop, and response server busy"

  1. queue?
    a. problem: when your client request rate <= server capability it's ok but once it burst some clients will sure get server busy error
    b. solution: queue delays clients' requests to after burst
  2. timeout?
    a. problem: user and client won't wait for minutes for a request to begin download, they'll stop the connection.
    b. solution: drop requests that are still in queue, and return server busy to client.
  3. drop requests when timeout or when queue full? should we drop new incoming requests so it won't be queued? or should we drop old requests that'are still in queue but not being handled by fcgi? should we timedout requests those still being handled?
Actions #5

Updated by admin over 16 years ago

but os tcp listen backlog which php set to 128

If persistent FCGI connections are used, that's not a good idea.

but imho, the solution should be defined as "convert timeout to busy"

What about:
I think Lighttpd should be limited to open just PHP_FCGI_CHILDREN as PHP is not able to handle more connections.

Actions #6

Updated by oschonrock over 16 years ago

Replying to moo:

php cannot return FCGI_OVERLOADED because it's prefork, and staticly not dynamic forked, it
won't grow.

I am aware of this, but that is not the issue here. A static number of processes is fine, because at a certain number of requests (depending on your php app) the server just becomes CPU bound anyway.

there's no "request queue" coded in php but os tcp listen backlog which php set to 128

I was not aware of that and this does not quite make sense to me. I thought the FASTCGI spec allowed for many requests over the same socket, and I thought that the php parent dispatches these to its child worker procs. If this is not the case, then that would be a better design.

Replying to Olaf van der Spek:

but os tcp listen backlog which php set to 128

If persistent FCGI connections are used, that's not a good idea.

I agree

Actions #7

Updated by admin over 16 years ago

I was not aware of that and this does not quite make sense to me. I

thought the FASTCGI spec allowed for many requests over the same socket,
and I thought that the php parent dispatches these to its child worker
procs. If this is not the case, then that would be a better design.

Eh, no, that's not how it works. Each PHP child handles requests over exactly one connection. So there's almost no communication between the PHP parent and it's children.

Actions #8

Updated by moo over 16 years ago

oops, i was not focused when i wrote the first part of my reply. for those who are curious why php fcgi can't return FCGI_OVERLOADED: php parent bind() a port, fork child workers, and the child accept the connection from the bind()'ed socket (port) directly, not accept/handle requests from parent.

What about:
I think Lighttpd should be limited to open just PHP_FCGI_CHILDREN as PHP is not able to handle more connections.

yeah, this is a good idea to eliminate/skip/avoid using php tcp-backlog "queue", but queue have to be enabled in either php or lighttpd and make sure it works as expected.

Actions #9

Updated by admin over 16 years ago

but queue have to be enabled in either php or lighttpd and make sure it works as expected.

Why?
Currently Lighttpd doesn't use a queue either and when it hits the PHP queue, you're already running into trouble.

Actions #10

Updated by oschonrock over 16 years ago

Replying to moo:

for those who are curious why php fcgi can't return FCGI_OVERLOADED: php parent bind() a port, fork child workers, and the child accept the connection from the bind()'ed socket (port) directly, not accept/handle requests from parent.

This seems like a very poor implementation to me. If php is serious about supporting a multi worker parent/child fcgi server SAPI, the surely it needs to implement some queue management and implement the FASTCGI spec properly. Never mind what it implements now (which seems like a hacky variant of CGI SAPI). Running apache/mod_php just does not scale (unless you run a separate apache for php requests only) so some scalable SAPI from the php crowd would be useful and the best candidate for that is fcgi. Has anyone come across these people:

http://php-fpm.anight.org/current_php_fastcgi_problems.html

or in (sort of) english ;-)

http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=ru_en&url=http%3A%2F%2Fphp-fpm.anight.org%2Fcurrent_php_fastcgi_problems.html

In the mean time lighty can just use some workaround strategy for detecting timeouts. Or just run php via spawn-fcgi as I suggested in the original post, which makes lighty/php behave nicely together under high load bursts. That is a very simple solution and moves you to the config you need anyway for multiple physical fcgi servers.

Actions #11

Updated by Anonymous over 16 years ago

Replying to Olaf van der Spek:

Why?
Currently Lighttpd doesn't use a queue either and when it hits the PHP queue, you're already running into trouble.

there is php listen backlog, as i said already. and yes, i have some problem with current implemention. and reasons like this are why ppl say "it's safer to load server <=80% instead of >=90%~100%"

Actions #12

Updated by Anonymous over 16 years ago

Replying to anonymous:

and reasons like this are why ppl say "it's safer to load server <=80% instead of >=90%~100%"

Our average load is more like 20% but there can always be unpredictable peaks.

Actions #13

Updated by admin over 16 years ago

there is php listen backlog, as i said already. and yes, i have some problem with current implemention. and reasons like this are why ppl say "it's safer to load server <=80% instead of >=90%~100%"

But you want to avoid hitting the listen backlog, because when you do, you run into this bug and the requests in the listen backlog aren't handled (for minutes).

Actions #14

Updated by gstrauss almost 8 years ago

  • Tracker changed from Bug to Feature
  • Description updated (diff)
  • Assignee deleted (jan)

FYI: lighttpd 1.4.40 allows configuration of lighttpd listen backlog as well as the listen backlog on sockets for backends spawned by lighttpd. Also, lighttpd 1.4.40 detects if client has abandoned request, and if that request has not yet been sent to fastcgi backend, the request will be discarded, thereby not loading fastcgi backend with useless work for an abandoned request.

Demoting this to feature request, as there can always be improvements. It is best practice to take multiple precautions to attempt to avoid overload situations in the first place.

Actions #15

Updated by gstrauss almost 8 years ago

  • Missing in 1.5.x set to Yes

FYI: lighttpd 1.4.40 handles PROC_STATE_DIED_WAIT_FOR_PID similar to PROC_STATE_OVERLOADED if waitpid() reports that the process is still alive, and lighttpd will re-enable the host after the disable_time.

Actions #16

Updated by gstrauss almost 8 years ago

  • Status changed from New to Obsolete
Actions

Also available in: Atom