Project

General

Profile

Actions

Bug #286

closed

lighttpd crashes under highload

Added by Anonymous over 18 years ago. Updated over 17 years ago.

Status:
Fixed
Priority:
Urgent
Category:
core
Target version:
-
ASK QUESTIONS IN Forums:

Description

we could trace down to a performance issue of lighttpd. sporadicly lighttpd crashes...
valgrind log is here: http://www.thecenter.at/lighttpd.1025.txt

-- sl

Actions #1

Updated by jan over 18 years ago

  • Status changed from New to Assigned

please verify if the problem persists with 1.4.5

Actions #2

Updated by Anonymous over 18 years ago

still is an issue. but its not as hard as before anymore. compare yourself: 1.4.4 i had a dozen crashs a day, with 1.4.5 i have "only" a couple.

-- sl

Actions #3

Updated by Anonymous over 18 years ago

I have seen similar during DOS condition. No core dump (though enabled). lighttpd seemed to 'stop'. php-cgi processes continued until I send a killall -TERM php-cgi. Did not need to send KILL, so however lighttpd stopped, it did not do so in an entirely orderly manner.

Trying

server.max-connections = 1024
server.max-fds = 3072

to see if max-connections protects against this problem. Well hopefully the DOS won't re-occurr ;)

Hope this extra information is useful.

Have a great weekend!

-- richardgreen1965

Actions #4

Updated by Anonymous over 18 years ago

I can confirm this too. I'm evaluating 1.4.7 and unexpectedly crashes after 10 minutes or so of high load. My environment is Debian 3.1 (sarge) with the stock 2.6.8 (-686-smp) kernel package.

I set it up to exclusively have mod_proxy distribute load to several (11) backend servers. No "regular" file requests were served by the server. At a output-rate of more than 150 Mbps and 1800 rps the process quietly exits all of a sudden. When I started lighttpd with the -D flag to see if anything was printed to stderr, I didn't see anything there either when it crashed again. However, I noticed that it did leave with an "aborted" exit code.

I switched off both the rrdtool- and accesslog-modules and could exclude them from suspicion.

I will try a more recent kernel revision later on, but my gut feeling hints me that the problem is indeed in Lighttpd.

-- conny

Actions #5

Updated by jan over 18 years ago

Can you generate a strace for me ? The wiki knows how to report a bug.

Actions #6

Updated by Anonymous over 18 years ago

I'll try to make one. Problem is that under high loads strace itself becomes the performance penalty, thus limiting the rq/sec rate and apparently the chance of the crash to occur...

-- conny

Actions #7

Updated by Anonymous over 18 years ago

Here are my premier results:


11:37:14.450805 accept(5, {sa_family=AF_INET, sin_port=htons(2315), sin_addr=inet_addr("[xxxxxxxxxxxxx]")}, [16]) = 42
11:37:14.450900 fcntl64(42, F_SETFD, FD_CLOEXEC) = 0
11:37:14.450941 fcntl64(42, F_SETFL, O_RDWR|O_NONBLOCK) = 0
11:37:14.450980 ioctl(42, FIONREAD, [7935]) = 0
11:37:14.451026 read(42, "POST /[xxxxxxxxxxx]\r\n[xxxxxxxxxxxxx]"..., 7935) = 7935
11:37:14.452304 ioctl(42, FIONREAD, [0]) = 0
11:37:14.452361 read(42, 0x886ec38, 4159) = -1 EAGAIN (Resource temporarily unavailable)
11:37:14.452440 write(2, "lighttpd: connections.c:962: connection_handle_read_state: Assertion `c->mem->used\' failed.\n", 92) = 92
11:37:14.452580 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
11:37:14.452664 gettid()                = 2539
11:37:14.452703 tgkill(2539, 2539, SIGABRT) = 0
11:37:14.452740 --- SIGABRT (Aborted) @ 0 (0) ---

A connection is accepted from a client and a POST request is read. Then we ask to read an additional 0 bytes from ...?

-- conny

Actions #8

Updated by Anonymous over 18 years ago

  • Status changed from Fixed to Need Feedback
  • Resolution deleted (fixed)

Wonderful! That patch fixed the problem..._in most cases_! I can still make it crash however (though it seems even less common now).


lighttpd: connections.c:962: connection_handle_read_state: Assertion `c->mem->used' failed.

I have not had time to make a new strace run yet. It looks like a variant of the same problem, no? That some certain chunk sequences still can slip through the cleanup?

-- conny

Actions #9

Updated by Anonymous over 18 years ago

I reproduced the crash with strace attached again. It's exactly the order of calls as last time (see above).

-- conny

Actions #10

Updated by Anonymous over 18 years ago

...but that was with 1.4.7+patch. I have not seen this after I upgraded to the 1.4.8 release. (On the other hand I also switched to slightly faster hardware.)

Let's close it and reopen if someone can reproduce with 1.4.8

-- conny

Actions #11

Updated by Anonymous over 18 years ago

  • Status changed from Need Feedback to Fixed
  • Resolution set to fixed

I can now confirm that this issue never appeared again after the 1.4.8 release.

-- conny

Actions

Also available in: Atom