Shouldn't I see a difference in CPU usage between a single-threaded vs a multi-threaded websocketpp server?

Question

I'm using a mulithreaded websocketpp server that I configured like this:

Server::Server(int ep) {
    using websocketpp::lib::placeholders::_1;
    using websocketpp::lib::placeholders::_2;
    using websocketpp::lib::bind;

    Server::wspp_server.clear_access_channels(websocketpp::log::alevel::all);

    Server::wspp_server.init_asio();

    Server::wspp_server.set_open_handler(bind(&Server::on_open, this, _1));;
    Server::wspp_server.set_close_handler(bind(&Server::on_close, this, _1));
    Server::wspp_server.set_message_handler(bind(&Server::on_message, this, _1, _2));

    try {
        Server::wspp_server.listen(ep);
    } catch (const websocketpp::exception &e){
        std::cout << "Error in Server::Server(int): " << e.what() << std::endl;
    }
    Server::wspp_server.start_accept();
}

void Server::run(int threadCount) {
    boost::thread_group tg;

    for (int i = 0; i < threadCount; i++) {
        tg.add_thread(new boost::thread(
            &websocketpp::server<websocketpp::config::asio>::run,
            &Server::wspp_server));
        std::cout << "Spawning thread " << (i + 1) << std::endl;
    }

    tg.join_all();
}

void Server::updateClients() {
    /*
       run updates
    */
    for (websocketpp::connection_hdl hdl : Server::conns) {
        try {
            std::string message = "personalized message for this client from the ran update above";
            wspp_server.send(hdl, message, websocketpp::frame::opcode::text);
        } catch (const websocketpp::exception &e) {
            std::cout << "Error in Server::updateClients(): " << e.what() << std::endl;
        }
    }
}

void Server::on_open(websocketpp::connection_hdl hdl) {
    boost::lock_guard<boost::shared_mutex> lock(Server::conns_mutex);
    Server::conns.insert(hdl);

    //do stuff


    //when the first client connects, start the update routine
    if (conns.size() == 1) {
        Server::run = true;
        bool *run = &(Server::run);
        std::thread([run] () {
            while (*run) {
                auto nextTime = std::chrono::steady_clock::now() + std::chrono::milliseconds(15);
                Server::updateClients();
                std::this_thread::sleep_until(nextTime);
            }
        }).detach();
    }
}

void Server::on_close(websocketpp::connection_hdl hdl) {
    boost::lock_guard<boost::shared_mutex> lock(Server::conns_mutex);
    Server::conns.erase(hdl);

    //do stuff

    //stop the update loop when all clients are gone
    if (conns.size() < 1)
        Server::run = false;
}

void Server::on_message(
        websocketpp::connection_hdl hdl,
        websocketpp::server<websocketpp::config::asio>::message_ptr msg) {
    boost::lock_guard<boost::shared_mutex> lock(Server::conns_mutex);

    //do stuff
}

I start the server with:

int port = 9000;
Server server(port);
server.run(/* number of threads */);

The only substantial difference when you add connections is in the message emission [wssp.send(...)]. The increasing number of clients doesn't really add anything to the internal computation. It's only the amount of message to be emitted that augments.

My problem is that the CPU usage doesn't seem to be that much different whether I use 1 or more threads.

It doesn't matter that I start the server with server.run(1) or server.run(4) (both on a 4 core CPU dedicated server). For a similar load, the CPU usage graph shows approximately the same percentage. I was expecting the usage to be lower with 4 threads running in parallel. Am I thinking of this the wrong way?

At some point, I got the sense that the parallelism really applies to the listening part more than the emission. So, I tried enclosing the send within a new thread (that I detach) so it's independent of the sequence that requires it, but it didn't change anything on the graph.

Am I not supposed to see a difference in the work that the CPU produces? Otherwise, what am I doing wrong? Is there another step that I'm missing in order to force the messages to be emitted from different threads?

The code that calls `updateClients()` sleeps for 15 milliseconds each time. On a reasonably fast processor, that 15 milliseconds will be significantly larger than the time needed to actually do the updating, so the fraction of time the thread function spends thrashing the CPU is negligible. [Obviously you didn't write the code, since it does this quite deliberately - presumably to avoid having an undue impact by starving other processes of CPU time]. — Peter, Dec 18 '17 at 07:26
I'm voting to close this question as off-topic because it is asking why code does not behave in a particular way, when that code is clearly deliberately crafted so it doesn't behave that way. — Peter, Dec 18 '17 at 07:28
@Peter, thanks for your comment. I did write the `updateClients()` method and the thread that calls it, entirely. Although I may have simplified things a little bit to post the question. The reason I have the thread sleep is to simulate a certain number of frames per second (at least that was my attempt at achieving that). This is that every client must be updated about 66 times a second in my code above (it varies from 30 to 60 fps in the real code) as soon as one client is connected. There's an internal update that runs each time in the `updateClients()` before calling `wspp.send(...)`. — ribbit, Dec 18 '17 at 14:38
Possible duplicate of [What is the difference between concurrency, parallelism and asynchronous methods?](https://stackoverflow.com/questions/4844637/what-is-the-difference-between-concurrency-parallelism-and-asynchronous-methods) — Caleth, Dec 18 '17 at 14:54

score 1 · Accepted Answer · answered Dec 18 '17 at 07:50

"My problem is that the CPU usage doesn't seem to be that much different whether I use 1 or more threads."

That's not a problem. That's a fact. It just means that the whole thing isn't CPU bound. Which should be quite obvious, since it's network IO. In fact, high-performance servers often dedicate only 1 thread to all IO tasks, for this reason.

"I was expecting the usage to be lower with 4 threads running in parallel. Am I thinking of this the wrong way?"

Yes, it seems to. You don't expect to pay less if you split the bill 4 ways either.

In fact, much like at the diner, you often end up paying more due the overhead of splitting the load (cost/tasks). Unless you require more CPU capacity/lower reaction times than a single thread can deliver, a single IO thread is (obviously) more efficient because there is no scheduling overhead and/or context switch penalty.

Another mental exercise:

if you run 100 threads, the processor will schedule them all across your available cores, in the optimal case
Likewise, if there are other processes running on your system (which there, obviously, always are) then the processor might schedule your 4 threads all on the same logical core. Do you expect the CPU load to be lower? Why? (Hint: of course not).

Background: What is the difference between concurrency, parallelism and asynchronous methods?

I see. I'm coming from a Nodejs/socket.io background, where only one core is used unless multiple child processes are spawned. I guess I kept thinking the same way and assumed that unless one would have multiple parallel threads, the app would only use one of the available cores. — ribbit, Dec 18 '17 at 15:00
There is indeed no big difference between the two. In nodejs you won't see the CPU load drop either when adding threads. The argument that the threads "manage themselves" doesn't change the outcome — sehe, Dec 18 '17 at 15:03
Thanks! Just a question. Would you say that for two CPU's **with the same frequency**, one with, say, 2c/4t and the other 4c/8t, if you applied the same load on both, the CPU usage would be lower on the latter? — ribbit, Dec 18 '17 at 16:04
Depends on how you define the metric. If you measure absolute load (ticks, cycles, flops?) then no. If you measure relative load (which is not unusual) then obviously, you'd expect 50% load — sehe, Dec 18 '17 at 20:30

Shouldn't I see a difference in CPU usage between a single-threaded vs a multi-threaded websocketpp server?

1 Answers1

Background: What is the difference between concurrency, parallelism and asynchronous methods?