4

I am developing a program using the libcurl. The program creates a thread, which in turn makes an HTTP request using libcurl. But sometimes the program crashes with the error

unexpected error 9 on netlink descriptor

After in curl turned off AsynchDNS. But the problem remains. As I understand the reason for assert is getaddrinfo. Maybe to use getaddrinfo in multi-threaded applications some kind of initialization is needed? Or is getaddrinfo generally non-thread safe?

GDB stack trace

libcurl version:

curl 7.67.0 (x86_64-pc-linux-gnu) libcurl/7.67.0 OpenSSL/1.1.0g zlib/1.2.11 libidn2/2.0.4 Release-Date: 2019-11-06 Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp Features: HTTPS-proxy IDN IPv6 Largefile libz NTLM NTLM_WB SSL TLS-SRP UnixSockets

glibc version:

ldd (Ubuntu GLIBC 2.27-3ubuntu1) 2.27 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper.

kirill-782
  • 141
  • 2
  • 9
  • 1
    Don't post a screenshot of all that text. Copy and paste the text into the question. – 1201ProgramAlarm Nov 12 '19 at 22:31
  • `getaddrinfo()` is guaranteed to be thread-safe. In fact, its thread safety is one of the reasons (amongst many) why it is preferred over the `gethostby...()` functions, which are not guaranteed to be thread-safe. – Remy Lebeau Nov 12 '19 at 22:38
  • I wanted to copy, but I only had a screenshot. In a couple of days, the program will crash again, and I will be able to copy the full call stack. –  kirill-782 Nov 12 '19 at 23:18
  • glib and glibc are two different things, btw. – Shawn Nov 12 '19 at 23:35

1 Answers1

3

This is a file descriptor race in the application. The typical scenario for error 9 (EBADF) looks like this:

  1. Thread A closes a file descriptor.
  2. Thread B calls getaddrinfo and opens a Netlink socket. It happens to receive the same descriptor value.
  3. Due to a bug, thread A closes the same file descriptor again. Normally, that would be benign, but due to the concurrent execution, the Netlink socket created by glibc is closed.
  4. Thread B attempts to use the Netlink socket descriptor and receives the EBADF error.

The key to fixing such bugs is figuring out where exactly the double-close happens.

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92
  • 1
    Do you have anything to contribute as to how to do this? `The key to fixing such bugs is figuring out where exactly the double-close happens.` That is a monumental and vague task. How do you wrap usage so you can determine where the race condition is coming from? Why would it be trying to reuse a descriptor so recently closed instead of providing one that has no risk of this? – Enigma Nov 03 '21 at 18:06
  • POSIX requires that the lowest available descriptor is used for new descriptors, which tends to maximize the potential for such races. Detecting them is somewhat application-dependent; I would try to come up with a solution based on Systemtap or Dyninst if the application is sufficiently complex. – Florian Weimer Nov 05 '21 at 21:52