0

I'm multithreading a data processing program that reads from a file and then performs computations before writing the results to another file. The program needs to do reprocessing for different input parameters and I want to put each instance of the reprocessing on its own thread. Each instance is computation intensive so the thread creation overhead pales in comparison.

I'm looking at using the C++11 thread library. I may be accessing the same file from 100 different threads. This seems silly to me and I have a gut feeling that there is a better way to do this.

I saw in the comments that a thread/work pool has been mentioned. Can somebody provide a link with an implementation example or just explain it to me? I have an 8 core machine and would prefer to keep all cores maximized. Completely open to suggestions since I'm trying to learn how to do this efficiently.

jonnyd42
  • 490
  • 1
  • 9
  • 23
  • 1
    "At max, I may be accessing the same file from 100 different threads." -- Why so many threads? You're unlikely to have enough cores to run them all. Is each thread processing the entire file? How big is the input file? -- A better solution would be to create a queue of work items, containing the parameters, and having a smaller threadpool (adequate to the number of CPU cores) that consumes this queue. Depending on the size of input, you could even memory map it, and just have the threads work with that. – Dan Mašek Apr 10 '16 at 00:03
  • 2
    Can you just slurp the file in the main thread, then pass the pointer to all the other threads? – o11c Apr 10 '16 at 00:05
  • @DanMašek can you link me to a way to do this? I have an 8 core machine, and I'd prefer to make a work pool and just keep each thread at max throughput rather than making a bunch of threads. I didn't think about that since I'm new to this. – jonnyd42 Apr 10 '16 at 00:09
  • 2
    The operating system has its own locking mechanism for files. Multiple threads can open a file for reading since the file isn't being modified, but the mechanism will only allow one thread access for writing. – eoD .J Apr 10 '16 at 00:15
  • @jonnyd42 For memory mapped files you can use [boost::interprocess](http://www.boost.org/doc/libs/1_55_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.mapped_file). Or just read the file into some array. Define a structure containing reference to the input data, and the parameters. Create a synchronized queue of those structures and fill it with all possible param combinations. Make a vector of threads (for example size 8) that repeatedly consume this queue and do the processing. If input is large, just have the threads read the file. – Dan Mašek Apr 10 '16 at 00:15
  • @jonnyd42 [Example of synchronized queue](https://juanchopanzacpp.wordpress.com/2013/02/26/concurrent-queue-c11/). Just end the thread when it's empty if you pre-populate it. Example of [thread pool is here](http://www.cplusplus.com/reference/thread/thread/thread/) -- just a vector of threads running the same function. – Dan Mašek Apr 10 '16 at 00:20

0 Answers0