Ticket #1245 (closed defect: fixed)

Opened 15 months ago

Last modified 14 months ago

Repeatable 100% CPU usage due to remote FastCGI app misbehaviour

Reported by: Olaf van der Spek Owned by: jan
Priority: normal Milestone: 1.4.16
Component: core Version: 1.4.15
Severity: normal Keywords:
Cc: Blocked By:
Need User Feedback: no Blocking:

Description

Hi,

I'm writing a new FastCGI app (without using a FastCGI lib) and I got Lighttpd to eat all my CPU cycles.

accept(4, {sa_family=AF_INET, sin_port=htons(3052), sin_addr=inet_addr("192.168.0.131")}, [16]) = 6
brk(0x80d2000)                          = 0x80d2000
brk(0x80f3000)                          = 0x80f3000
fcntl64(6, F_SETFD, FD_CLOEXEC)         = 0
fcntl64(6, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
ioctl(6, FIONREAD, [444])               = 0
read(6, "GET /xbt/ HTTP/1.1\r\nHost: 192.16"..., 447) = 444
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 7
fcntl64(7, F_SETFD, FD_CLOEXEC)         = 0
fcntl64(7, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
connect(7, {sa_family=AF_INET, sin_port=htons(2711), sin_addr=inet_addr("192.168.0.131")}, 16) = -1 EINPROGRESS (Operation now in progress)
accept(4, 0xbf863918, [112])            = -1 EAGAIN (Resource temporarily unavailable)
time(NULL)                              = 1182719853
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLOUT, revents=POLLOUT}], 2, 1000) = 1
getsockopt(7, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
getsockname(6, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.168.0.128")}, [16]) = 0
writev(7, [{"\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0\1\4\0\1\0036\0\0\17\17"..., 854}, {"\1\5\0\1\0\0\0\0", 8}], 2) = 862
time(NULL)                              = 1182719853
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}], 2, 1000) = 0
time(NULL)                              = 1182719854
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN, revents=POLLIN}], 2, 1000) = 1
ioctl(7, FIONREAD, [0])                 = 0
time(NULL)                              = 1182719854
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN, revents=POLLIN}], 2, 1000) = 1
ioctl(7, FIONREAD, [0])                 = 0
time(NULL)                              = 1182719854
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN, revents=POLLIN}], 2, 1000) = 1
ioctl(7, FIONREAD, [0])                 = 0

Attachments

Change History

Changed 14 months ago by jan

  • status changed from new to assigned
  • milestone changed from 1.5.0 to 1.4.16

Can you please remove line 2443 from mod_fastcgi.c and try again ?

    } else {
        if (errno == EAGAIN) return 0; <-- this one
        log_error_write(...)

Changed 14 months ago by darix

  • blocking set to 1250

Changed 14 months ago by Olaf van der Spek

I don't see ioctl returning EAGAIN in my strace. Why do you think that change will have an effect?

ioctl(7, FIONREAD, [0]) = 0

BTW, I'm trying to reproduce the issue again now, but I didn't succeed yet. The strace indicates my app send something back (revents=POLLIN) but I can't remember doing any writes yet.

Changed 14 months ago by darix

because line 2443 would never be reached if errno was EAGAIN above in this function. that said ... it must be an old EAGAIN errno which shouldnt be handled here.

Changed 14 months ago by Olaf van der Spek

I've removed the line but the behaviour didn't change, it still goes in the loop.

Changed 14 months ago by Olaf van der Spek

Actually, it did the trick, now it says: 2007-07-02 14:58:08: (mod_fastcgi.c.2463) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:192.168.0.131:2711 2007-07-02 14:58:08: (mod_fastcgi.c.3257) response not received, request sent: 862 on socket: tcp:192.168.0.131:2711 for /xbt , closing connection

Changed 14 months ago by darix

so it still loops (takes all the cpu?) but at least we reach the error message again?

Changed 14 months ago by Olaf van der Spek

No, it properly closes the fd now too, so it doesn't loop anymore.

Changed 14 months ago by Olaf van der Spek

I've removed the line but the behaviour didn't change, it still goes in the loop.

This was my fault, I used the new executable but the old modules (probably, I just pointed it to the old conf).

Changed 14 months ago by darix

  • status changed from assigned to closed
  • resolution set to fixed

fixed in 1879

Changed 14 months ago by Olaf van der Spek

Wouldn't it be much safer to store and use the function return value instead of the global errno, to prevent such bugs completely?

Changed 14 months ago by darix

uhm. many system functions use errno. lighttpd's own code doesnt use errno internally iirc.

Changed 14 months ago by Olaf van der Spek

Ah, you're right: "On error, -1 is returned, and errno is set appropriately." I assumed the error itself would be returned, but with just -1 you need to use errno.

Add/Change #1245 (Repeatable 100% CPU usage due to remote FastCGI app misbehaviour)

Author



Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.