0

In my asio based application I have problems with the error handling. I use asio single-threaded/single io_service with the async_read / async_write functions, and when the error-code passed to me in the completion handler is not null, any call to its .message() method results in a segfault, because the error-category pointer points to null. However, the actual value is always correct.

Weirdly I am not able to reproduce this in a test application. This is essentially what I'm doing, but in case of an error, calling message() here works as expected:

void do_write(std::shared_ptr<std::vector<unsigned char>> , std::shared_ptr<boost::asio::ip::tcp::socket> );

void do_read(std::shared_ptr<std::vector<unsigned char>> buf, std::shared_ptr<boost::asio::ip::tcp::socket> sock)
{
   boost::asio::async_read(*sock, boost::asio::buffer(*buf), boost::asio::transfer_exactly(100000),
                           [sock, buf](const boost::system::error_code& ec, std::size_t bytes)
   {
      std::cout << "read\n";
      if (ec) {
         std::cout << ec.message() << std::flush;
      } else {
         do_write(buf, sock);
      }
   });
}

void do_write(std::shared_ptr<std::vector<unsigned char>> buf, std::shared_ptr<boost::asio::ip::tcp::socket> sock)
{
   boost::asio::async_write(*sock, boost::asio::buffer(*buf), [sock, buf](const boost::system::error_code& ec, std::size_t bytes)
   {
      std::cout << "write\n";
      if (ec) {
         std::cout << ec.message() << std::flush;
      } else {
         do_read(buf, sock);
      }
   });
}

int main()
{
   auto buf = std::make_shared<std::vector<unsigned char>>(100000);

   unsigned short port = 1113;
   boost::asio::io_service ios;
   boost::asio::ip::tcp::acceptor acptr {ios, boost::asio::ip::tcp::endpoint{boost::asio::ip::tcp::v4(), port}};

   auto socket = std::make_shared<boost::asio::ip::tcp::socket>(acptr.get_io_service());
   acptr.async_accept(*socket, [socket, buf](const boost::system::error_code& ec)
   {
      do_read(buf, socket);
   });
   ios.run();
}

I do not think it is a problem with lifetimes of my application, AddressSanitizer does not find any potential errors and I can shift hundreds of megabytes of data without any problems. Also if I just uncomment the call to message(), the server continues to run fine and correctly handles new connections. Anyway, here is the callstack when the crash occurs:

https://gist.github.com/mariusherzog/82f24caf9eea4d94946706aa8c025ef1 The category points to null from frame 11 upwards.

I use linux with boost 1.62 with clang 3.9.1 and gcc 5.4.0

Marius Herzog
  • 589
  • 3
  • 18

1 Answers1

1

The only thing I can think of is right now that a piece of code initializes as part of constructing global static data. If it grabs a category reference at that time and there are multiple translation units, the category instance might not have been constructed yet.

This situation is affectionately referred to as the Static Initialization Order Fiasco.


Other than that, the converse comes to mind (the global category has been destructed). This seems a little less likely because 1. you'd have realized it happens at shutdown 2. references don't usually become null in such a scenario - although an implementation might

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks for the tip. The error category (get_misc_category) is accessed via a Meyer's singleton, and if I put a breakpoint there, the actual instance is all null in memory from the beginning on. Might this be a bug in asio? – Marius Herzog Nov 07 '17 at 20:27
  • Also it is called multiple times in my application, seemingly caused by different translation units... – Marius Herzog Nov 07 '17 at 20:33
  • It doesn't really matter whether you access it via a Meyer's singleton: more importantly, `get_misc_category()` ***is*** itself a Meyer's singleton. If that goes wrong the only thing I can really imagine going wrong is runtime linking. – sehe Nov 07 '17 at 22:40
  • Perhaps multiple copies of the singleton are in different modules, and somehow the one in the "extra" module is not getting initialized. I don't actually see how this could legally happen, there might be a (runtime) linker bug. Are you using LTO? Are you perhaps linking Boost System statically into a shared library? See also this related question: https://stackoverflow.com/a/43011009/85371 and this excellent talk: https://www.youtube.com/watch?v=w7ZVbw2X-tE – sehe Nov 07 '17 at 22:40
  • By the way "it's all null in memory fromt the beginning on" is how it's supposed to be, until `get_misc_category()` is first called, of course. The TL;DR of the above links is that it _is_ quite possible to get multiple copies of `get_misc_category()` so identical categories compare un-equal when compared by address. However, what I cannot see explain is why any single copy of `get_misc_category()` would have it's initialization guard malfunction so that the function-local static never gets initialized. I do feel that that takes a linker bug. – sehe Nov 07 '17 at 22:44
  • Note: When using c++03, Meyer's singleton is not thread safe, so that would make using `get_misc_category()` a potential data race in c++03 depending on how you use it. – sehe Nov 07 '17 at 22:45
  • I am using C++11 and I am linking boost dynamically. However I have all my network related code in a library which I link statically to the "main" application, but this should not be a problem? – Marius Herzog Nov 08 '17 at 07:42
  • @MariusHerzog depending on whether you rely on categories having a fixed address this could be an issue. However like I said I don't think it _should_ explain a null-reference/uninitialized global data. This was all already above. If you want to know how it could be a problem, follow the links, otherwise, look at the likely culprits (name-map your binaries, look for conflicts, perhaps `nm -C`; disable some optimization flags, inspect the generated assembly to find the bug (`objdump -S -t` e.g.) – sehe Nov 08 '17 at 07:48
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/158478/discussion-between-sehe-and-marius-herzog). – sehe Nov 08 '17 at 07:52