Ticket #657 (closed defect: fixed)

Opened 2 years ago

Last modified 3 months ago

lighty vs. apt-get - problem with pipelining

Reported by: raas Assigned to: jan
Priority: normal Milestone: 1.4.19
Component: core Version: 1.4.11
Severity: normal Keywords: apt apt-get pipeline pipelining
Cc: Blocking:
Need Feedback: 0

Description

Symptoms: running an in-house debian mirror on lighty I've noticed that sometimes the connection is "reset by peer" while downloading packages. This happens fairly rarely (maybe 1% of the time, meaning 1 in 100 packages will produce the problem). Retrying helps. Also, adding

Acquire::http::Pipeline-Depth "0";

to the apt config - aka, disabling http/1.1 pipelining - seems to be a valid workaround.

However, the APT docs scream about the above as being only needed on non-standard-compliant platforms. Quoting from 'man apt.conf':

"One setting is provided to control the pipeline depth in cases where the remote server is not RFC conforming or buggy (such as Squid 2.0.2) Acquire::http::Pipeline-Depth can be a value from 0 to 5 indicating how many outstanding requests APT should send. A value of zero MUST be specified if the remote host does not properly linger on TCP con‐ nections - otherwise data corruption will occur. Hosts which require this are in viola‐ tion of RFC 2068."

tcpdumps are available if anyone's interested.

cheers,

raas

Attachments

Change History

05/24/2006 02:31:57 AM changed by moo

there can be keep-alive limits in the server side, and the client never know when it reach the limit. the client should retry if the pipelined requests is failed. even the 1st request in the pipeline should be try 2 times, according to the http rfc

12/06/2006 02:26:14 AM changed by mark@nedworks.org

We were also seeing this issue when using APT with lighttpd 1.4.13 over a fast LAN. I did some test runs with trace and tcpdumps, and found the following strace:

open("/srv/ubuntu/pool/main/p/parted/parted_1.7.1-2.1ubuntu3_i386.deb", O_RDONLY|O_LARGEFILE
) = 10
fcntl64(10, F_SETFD, FD_CLOEXEC)        = 0
sendfile64(8, 10, [0], 55200)           = -1 EAGAIN (Resource temporarily unavailable)
setsockopt(8, SOL_TCP, TCP_CORK, [0], 4) = 0
write(3, "66.230.200.243 apt.wikimedia.org - [06/Dec/2006:00:36:34 +0000] \"GET /ubuntu/pool
/main/p/parted/parted_1.7.1-2.1ubuntu3_i386.deb HTTP/1.1\" 200 0 \"-\" \"Ubuntu APT-HTTP/1.3
\"\n", 171) = 171
close(10)                               = 0
shutdown(8, 1 /* send */)               = 0

lighty receives a EAGAIN on sendfile() (buffer full?) and then shuts down the sending to the socket and logs a 200 OK?!?

The following code in network_linux_sendfile.c seems responsible for this:

            if (-1 == (r = sendfile(fd, c->file.fd, &offset, toSend))) {
                    switch (errno) {
                    case EAGAIN:
                    case EINTR:
                            r = 0;
                            break;
                    case EPIPE:
                    case ECONNRESET:
                            return -2;
                    default:
                            log_error_write(srv, __FILE__, __LINE__, "ssd",
                                            "sendfile failed:", strerror(errno), fd);
                            return -1;
                    }
            }
            if (r == 0) {
                    /* We got an event to write but we wrote nothing
                     *
                     * - the file shrinked -> error
                     * - the remote side closed inbetween -> remote-close */
                    if (HANDLER_ERROR == stat_cache_get_entry(srv, con, c->file.name, &sce)) {
                        /* file is gone ? */
                        return -1;
                    }
                    if (offset > sce->st.st_size) {
                        /* file shrinked, close the connection */
                        return -1;
                    }
                    return -2;
            }

If EAGAIN, r is set to 0. In the following code block, it's assumed that either the source file has shrunk (not the case) or the remote end must have closed the connection (not the case either). The latter seems strange to me - shouldn't ECONNRESET be returned for that?

EAGAIN likely means that some buffer is full, in which case this code returns -2 which makes lighty close down the connection.

I disabled the return -2, which seems to fix this issue. However, there seems to be another bug occurring with APT...

12/06/2006 06:11:05 PM changed by mark@nedworks.org

...and the other issue was simply a too low server.max-keep-alive-requests, which is not set to 128 by default as the manual used to say, but to 16.

lighttpd correctie closed the connection after 16 requests with Connection: close, but apparently APT doesn't handle that correctly.

09/15/2007 10:26:40 AM changed by formorer@debian.org

  • blocking changed.
  • pending changed.

Hi,

we have the same problems with debian.netcologne.de and deb.grml.org where I tried lighttpd. A fix would be really appreciated.

Thanks Alex

02/12/2008 02:00:56 PM changed by jan

  • status changed from new to closed.
  • resolution set to fixed.
  • milestone set to 1.4.19.

a patch for the EAGAIN was applied in [2072]


Add/Change #657 (lighty vs. apt-get - problem with pipelining)




Change Properties
Action