1

I currently have a very weird problem. Sometimes my ::acceptor (using async_accept) doesn't accept a socket (there is no call of the accept function specified in async_accept) but the connector returns true on the connect function (both boost::asio:: classes). Sometimes all works but sometimes I don't get an acceptor socket.... I really don't know why anymore. I tested everything under Win10 64bit with different listen_ports.
My server structure is the following:
io_service:

_io_thread = std::make_unique<std::thread>([this]() {
            while (1)
            {
                try
                {
                    _io_service->run();
                    break;
                }
                catch (const boost::exception& e)
                {
                    spdlog::error("IO_SERVICE_EXCEPTION: ", boost::diagnostic_information(e));
                }
            }
        });

acceptor:

void Server::initialize(uint16_t listen_port)
    {
        // at this point the io_thread is running already

        auto endpoint = ip::tcp::endpoint(ip::tcp::v4(), listen_port);

        _acceptor = std::make_unique<boost::asio::ip::tcp::acceptor>(*_io_service, endpoint.protocol());
        _acceptor->set_option(boost::asio::ip::tcp::acceptor::reuse_address(false));
        _acceptor->bind(endpoint);
        _acceptor->listen();

        _listen_port = listen_port;

        __accept();
    }

__accept():

void Server::__accept()
    {
        // create socket for next connection
        _acceptor_socket = std::make_unique<ip::tcp::socket>(*_io_service);

        _acceptor->async_accept(*_acceptor_socket, [this](const boost::system::error_code& err)
            {
                spdlog::info("Accept new socket..."); // this sometimes doesn't get called :(

                if (err.failed())
                {
                    // acceptor closed
                    if (err == boost::asio::error::operation_aborted)
                    {
                        spdlog::info("network server stopped accepting sockets");
                        return;
                    }

                    // unknown error
                    spdlog::error("network server accepting socket failed {} message {} num {}", err.failed(), err.message(), err.value());

                    // accept next connection
                    __accept();
                    return;
                }

                // accept new socket
                _new_connections_mutex.lock();

                auto con = __create_new_connection();
                con->initialize(++_connection_id_counter, std::move(_acceptor_socket));
                con->set_handler(_input_handler);
                con->goto_phase(_initial_phase);
                spdlog::info("new connection from {} with id {}", con->get_host_name(), con->get_unique_id());

                _new_connections[con->get_unique_id()] = std::move(con);

                _new_connections_mutex.unlock();

                // wait for next connection
                __accept();
            }
        );
    }



My client connector is simple like that:

        auto socket = std::make_unique<boost::asio::ip::tcp::socket>(*_io_service);

        spdlog::info("try to connect to {}:{}", endpoint.address().to_string(), endpoint.port());
        try
        {
            socket->connect(endpoint);
        }
        catch (const boost::system::system_error & e)
        {
            spdlog::error("cannot connect to {}:{}", endpoint.address().to_string(), endpoint.port());
            return false;
        }

        // this succeeds everytime...
        [...]

So... shouldn't everytime when the connector socket succeeds to connect the acceptor socket be created as well? I hope somebody knows what's going wrong here :/

Crispy
  • 65
  • 7
  • 1
    Wait, why do you put reuse_address to false? I am not entirely sure what it actually does (it is always a hidden mystery) but it might result it, say, being unable to reopen socket from the same address for a short while. – ALX23z Feb 06 '20 at 14:37
  • 1
    Identifiers with leading `__` are reserved _in all scopes_; Using them is technically UB https://stackoverflow.com/questions/228783/what-are-the-rules-about-using-an-underscore-in-a-c-identifier – sehe Feb 06 '20 at 16:31
  • There are several log statements in the code. Which of them do you see when you hit the problematic behaviour? – sehe Feb 06 '20 at 16:35
  • 1
    If an exception occurs between mutex `.lock()` and `.unlock()` you will have soft or deadlock. It is rarely necessary to use these, instead use `std::lock_guard`, `std::unique_lock` or `std::shared_lock` which are exception safe – sehe Feb 06 '20 at 16:37
  • @ALX23z I put it to false to see if the address is already in use (I catch the exception in the code that call's this function). So that I'm sure the address isn't used alreay and is free to use for a new acceptor. – Crispy Feb 06 '20 at 17:12
  • 1
    @sehe: Up to now I used __ for private and protected member functions, _ for member variables. What do you recommend? I read the glibc page and yeah __ is reserved for global C functions... the point with std::lock_guard is actually a good point. I will change that. The connector socket ("try to connect to ...") is successfully created (actually after that is another log that verifys that) but the log "Accept new socket..." sometimes appear and sometimes don't (if it don't I have to restart the server until it can accepts sockets...) – Crispy Feb 06 '20 at 17:14

1 Answers1

1

Okay I found the answer.... the io_service::run function doesn't work as I expected. I thought it will just block forever if io_service::stop is not called. Seems like that is wrong. Because I break after one ::run call out of my while loop and then the thread is finished the io_service sometimes didn't run anymore when the connector socket connected. Have changed the io_service::run -> run_one() and removed the break. Will now add a loop-cancel variable so it's no infinite loop...
Thanks for your answers !

Crispy
  • 65
  • 7
  • 1
    Ah, the fact that you start the `io_service` on a thread is probably the subtle issue. If you start it _before_ actually posting work (e.g. using async_* calls) then it will return because "there is no more work". Which means the background thread is already done before you posted the first work. The code posted did not show the relation between the creation of the thread and the call to `Server::initialize`. – sehe Feb 06 '20 at 19:15
  • The common way to avoid this race-condition is to use a work-guard ([`io_service::work` or `executor_work_guard` depending on version used)](https://www.boost.org/doc/libs/1_72_0/doc/html/boost_asio/reference/io_context__work.html). Be sure to use the convenient [`make_work_guard` helper](https://www.boost.org/doc/libs/1_72_0/doc/html/boost_asio/reference/make_work_guard.html) if you can. – sehe Feb 06 '20 at 19:20
  • Yeah I didn't think that the order would be relevant. I've now the call to the async_accept before starting the io_service thread. If the async_accept everytime starts another async_accept request on call there shouldn't be any problem and I don't need a io_service::work object, am I right? – Crispy Feb 06 '20 at 21:02
  • Indeed. [padding] – sehe Feb 06 '20 at 21:15