How can I create so many threads in c++ on beaglebone black

Question

I want to create over 500 threads in c++ on beaglebone black but the program has errors. could you explain why the errors is occured and how I fix the errors

in thread func. : call_from_thread(int tid)

void call_from_thread(int tid)
{
    cout << "thread running : " << tid << std::endl;
}

in main func.

int main() {
    thread t[500];

    for(int i=0; i<500; i++) {
        t[i] = thread(call_from_thread, i);
        usleep(100000);
    }

    std::cout << "main fun start" << endl;

    return 0;
}

I expects

...
...
thread running : 495
thread running : 496
thread running : 497
thread running : 498
thread running : 499
main fun start

but

...
...
thread running : 374
thread running : 375
thread running : 376
thread running : 377
thread running : 378
terminate called after throwing an instance of 'std::system_error'
  what():  Resource temporarily unavailable
Aborted

could you help me?

May I ask for what purpose you need 500(!) threads? The error message is pretty clear IMO. — πάντα ῥεῖ, Apr 22 '19 at 08:39
each thread requires memory for the stack. My guess is that you run out of memory. Usually threads require 0.5 MB of stack memory. — Raxvan, Apr 22 '19 at 08:54
I want to fit 500 clowns in a VW Beetle, but I can only manage 378. — n. m. could be an AI, Apr 22 '19 at 08:59
If you find this answer useful, please consider "accepting" it (by clicking the tick (✓) next to it) to indicate you've found a working solution and also so that others may more easily find it in the future. — ppetraki, Apr 23 '19 at 15:08

ppetraki · Answer 1 · 2019-04-23T15:07:29.360

The beaglebone black appears to have a maximum of 512MB of DRAM. The minimum stack size of a thread according to pthread_create() is 2MB.

i.e. 2^29 / 2^21 = 2^8 = 256. So what you're probably seeing around thread 374 is the allocator cannot free memory fast enough to meet the demand which is handled by throwing an exception.

If you really want to see this explode, try moving that sleep call inside your thread function. :)

You could try preallocating the stack to 1MB or less (pthreads), but that has it's own set of problems.

The questions to really ask yourself is:

Is my application io bound or compute bound?
What's my memory budget to run this application? If you spend your entire physical memory on thread stacks, you'll have nothing left for the shared program heap.
Do I really need this much parallelism to do the job? The A8 is a single core machine BTW.
Could I solve the problem using a thread pool? Or not use threads at all?

Finally, you can't set the stack size in std::thread api, but you can in boost::thread. Or just write a thin wrapper around pthreads (assuming Linux).

score 0 · Answer 2 · answered Apr 22 '19 at 10:06

Whenever you use threads, there are three parts.

Start the threads
Do the work
Release the thread

You're starting the threads and doing the work, but you're not releasing them.

Releasing threads. There are two options for releasing a thread.

You can join the thread (which basically waits for it to finish)
You can detach the thread, and let it execute independently.

In this particular case, you don't want the program to finish until all threads are done executing, so you should join them.

#include <iostream>
#include <thread>
#include <vector>
#include <string>

auto call_from_thread = [](int i) {
    // I create the entire message before printing it, so that there's no interleaving of messages between threads
    std::string message = "Calling from thread " + std::to_string(i) + '\n';
    // Because I only call print once, everything gets printed together
    std::cout << message;
};
using std::thread;


int main() {
    thread t[500];

    for(int i=0; i<500; i++) {
        // Here, I don't have to start the thread with any delay
        t[i] = thread(call_from_thread, i);
    }

    std::cout << "main fun start\n";

    // I join each thread (which waits for them to finish before closing the program)
    for(auto& item : t) {
        item.join();
    }
    return 0;
}

While this is good advice in general, I'll bet it doesn't help here. The problem is simply that there are not enough system resources to support 500 threads simultaneously, which is why the OS protests after 378. The original code never reaches the point where the threads should be joined because they don't all get created. — Pete Becker, Apr 22 '19 at 13:30

How can I create so many threads in c++ on beaglebone black

2 Answers2