Ticket #673 (new defect)

Opened 2 years ago

Last modified 8 months ago

Connection error on Solaris

Reported by: lighttpd@fudgemond.org Assigned to: jan
Priority: normal Milestone:
Component: core Version: 1.4.11
Severity: major Keywords:
Cc: Blocking:
Need Feedback:

Description

As described in the forum, but gained no response.

I have lighttpd-1.4.11 compiled on solaris 10 using gcc 3.3.2. I am serving nothing but static files and do not have anything but mod_status and mod_accesslog enabled.

I start lighttpd and some connections get made, others, including a heartbeat request, block and the following error is dumped in the error log:

2006-06-05 16:32:18: (connections.c.222) unexpected end-of-file: 10

This server is used in a very high traffic situation and I'm looking to replace Apache in order to get more simultaneous clients. I have added the following in the config:

server.max-fds = 8192
server.max-keep-alive-requests = 5000
server.max-keep-alive-idle = 90

as a config similar to this on a Linux box works very well for me.

I think lighttpd is fantastic, but this is a major blocker for me when trying to get it working on Solaris.

Regards Stephen

Attachments

Change History

12/01/2006 12:50:36 AM changed by joe@thrallingpenguin.com

ioctl may return any negative value as an error. I change the line to look for a value less than 0. Granted it hasn't been long since I've made the change, but I've not seen this error log message appear yet.

Line 221:
  if (ioctl(con->fd, FIONREAD, &toread) < 0) {

12/04/2006 01:50:16 AM changed by joe@thrallingpenguin.com

The error message appears less now. When it does occur, it's logging it as remote host drop connection or broken pipe. I assume this is because the browser disappeared, but there could be yet a bug still involved.

01/15/2007 02:47:36 AM changed by joe@thrallingpenguin.com

Follow-up: The code change helped, but not much. However; after digging around SunSolve? and OpenSolaris?, I've turned up some information that says to set a setting in order to work around the problem. The links are below. The setting is:

ndd -set /dev/tcp tcp_co_min 1500

1500 = MTU of your network interface card.

http://sunsolve.sun.com/search/document.do?assetkey=1-1-4701102-1

http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=2387e881a19c7affffffffdbf791ee9a8d6b1?bug_id=4789772

06/26/2007 04:57:41 PM changed by ingenthr

After looking into this for a customer, I can say pretty confidently that the cause is not bug 4701102, as it was fixed back in 2003 and the changes for that fix are still in current Solaris/OpenSolaris code.

I checked with another engineer and have learned this may just be incorrect error handling with the stream when using the devpoll backend. In other words, with this ioctl(), it's entirely possible to get an error but still have the stream readable. The best fix would probably be to change the error handling to anticipate a possible failure of this ioctl() when using this type of socket. The failure of this ioctl() in this case is not an indictation of error.

06/26/2007 04:58:49 PM changed by ingenthr

One other note, this was investigated with 1.4.15, but I also looked at a couple of files in 1.4.11 and it doesn't appear the behavior in this area has changed at all.

06/29/2007 01:14:01 AM changed by joe@thrallingpenguin.com

Hello ingenthr. So you would recommend to just simply ignore the return value of ioctl() altogether?

06/29/2007 01:36:27 AM changed by ingenthr

I believe so. After checking with another engineer to verify, we believe that ioctl() is not necessary with this nonblocking stream socket, and the error message therefore isn't required either. It can then fall through to the buffer code and the read.

In fact, it may not be necessary in the Linux epoll or poll cases either. This style check is normally not used with a nonblocking socket. That would remove a syscall in the critical path here. If whatever event mechanism (devpoll, epoll, poll()) says there's data there, it should be safe to do a read and check for errors from there. There could be something I'm not aware of on other implementations.

The one thing I'm certain of is that it is not related to bugid 4701102 or 4789772. The descriptions for those, and implementing the workaround, on a very busy system had no effect-- not to mention both have been closed and integrated for a couple of years. If it was that bug, the cause of which was notification propogating before data was available at the stream head, turning the tcp_co_min to the MTU (or higher) would mean you couldn't get in to that condition. It would, though, also have a negative effect on the performance-- so the workaround was more of a test to verify where the error was than it was a proper workaround. The fix was straightforward, and you can see it in the OpenSolaris? code for tcp.c still to this day.

Do you still see those messages occasionally on your system as well? I would imagine you probably do, since we saw them even though the workaround was in place.

By the way, I'm matt dot ingenthron at sun dot com if you'd like to discuss directly and update the bug as needed.


Add/Change #673 (Connection error on Solaris)




Change Properties