Ticket #1477 (closed defect: worksforme)

Opened 9 months ago

Last modified 3 days ago

Lighttpd kills Ubuntu network install / local mirror

Reported by: felderado@… Owned by: jan
Priority: normal Milestone: 1.5.0
Component: core Version: 1.4.18
Severity: normal Keywords:
Cc: Blocked By:
Need User Feedback: Blocking:

Description

If you try to netboot / net install Ubuntu Gutsy (haven't tried any others) over the network and you host the packages using Lighttpd instead of Apache, the installation will fail. This error is in the logs.


Dec 2 11:58:19 in-target: After unpacking 1765MB of additional disk space will be used. Dec 2 11:58:19 in-target: Get:1 http://192.168.1.144 gutsy/main libfuse2 2.7.0-1ubuntu5 [121kB] ..... Dec 2 11:58:28 in-target: Get:96 http://192.168.1.144 gutsy/main xmag 1:1.0.1-0ubuntu2 [19.1kB] Dec 2 11:58:28 in-target: E: Method http has died unexpectedly!


If I simply shutdown Lighttpd and try Apache, it works perfectly.

As you can see, that's 96 GETs in like 9 seconds. Lighttpd either can't handle it, or intentionally cuts us off :( I'm guessing we get cut off for some reason. Doesn't make any sense...

Only modules running are: mod_access, mod_alias, mod_accesslog, and mod_compress

I found these relevant errors in the error.log

2007-12-02 10:47:22: (network_linux_sendfile.c.171) sendfile failed: Input/output error 6 2007-12-02 10:47:22: (connections.c.603) connection closed: write failed on fd 6

Ignore the date/timestamps on those errors; they were copied from an earlier attempt so they don't match the first errors I posted obviously.

These errors couldn't be from anywhere else because the only reason I setup lighttpd was to host this install server.

Attachments

Change History

Changed 9 months ago by Olaf van der Spek

Do you have a network trace of what happens?

Why doesn't the client retry the request if it fails?

Changed 9 months ago by anonymous

I don't have a network trace. I have recreated this from several machines and also virtual machines.

By setting the network backend to "writev" I was able to make it through the install process after it failed once and then I retried it again. I've tried both 1.4x and 1.5x (which had the gthread-aio) and it doesn't really matter -- it simply doesnt like Lighttpd and I don't understand why... There is nothing special going on here; it's just serving a bunch of binary files.

Changed 9 months ago by Olaf van der Spek

I'm asking because there could also be a bug in the client you're using.

Changed 9 months ago by felderado@…

I can recreate it easily enough and get a packet dump of it for you tonight.

Changed 9 months ago by Olaf van der Spek

Have you managed to create the trace already?

Changed 7 months ago by stbuehler

  • pending set
  • Did the sendfile backend work for at least one request or failed everytime? Perhaps sendfile is not supported for your filesystem.
  • What did the error log say for "writev" / "gthread-io" ?
  • Traces are of course useful. Configs too.

Changed 7 months ago by mmaunder@…

I can confirm this. A stock Ubuntu 7.10 AMD64 Server install booting from the network using PXE and then installing from a local mirror running the a stock lighttpd install from Gutsy 7.10 repository (1.4.18-1ubuntu1) fails with the exact error the original poster mentioned.

I installed Apache prefork - also a standard install. The only change I made to apache was to increase MaxSpareServers? to 30. It worked perfectly from the same doc_root first time.

With lighttpd it consistently failed at exactly the same point in the install. FYI this was running on a Dell 2950 with the built in NIC on a gigabit ethernet switch with both the mirror and machine being installed on the same LAN.

Lighttpd wrote nothing to the error log and there was nothing useful in the access log either. I also checked the system logs and nothing.

This is very troubling because we run lighttpd as a front-end reverse proxy in our production environment and it processes well over 150 requests per second. So I'm wondering if requests are quietly failing under high load.

This error is 100% reproducible in our data center in a racked environment with a Dell 1GB switch and Dell 2950 servers. I ran a similar config in our office - same OS (also 64 bit) and with lighttpd and it worked fine. The only difference was the mirror machine was not a server class machine but it was an AMD64 arch. The mirror machine was also on a 100 Megabit port with 100MB nic and the server was 1GB - so perhaps the load wasn't enough to trigger this problem.

If I have time and two spare 2950's I'll try to repro this and debug in more detail.

Mark.

Changed 10 days ago by stbuehler

  • status changed from new to closed
  • resolution set to worksforme

I am sorry, but without more information we cannot help you.

Changed 3 days ago by stbuehler

  • pending unset

Add/Change #1477 (Lighttpd kills Ubuntu network install / local mirror)

Author



Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.