Wholly new to multithreading.
I am writing a program which takes as input a vector of objects and an integer for the number of threads to dedicate. The nature of the objects isn't important, only that each has several members that are file paths to large text files. Here's a simplified version:
// Not very important. Reads file, writes new version omitting
// some lines
void proc_file(OBJ obj) {
std::string inFileStr(obj.get_path().c_str());
std::string outFileStr(std::string(obj.get_path().replace_extension("new.txt").c_str()));
std::ifstream inFile(inFileStr);
std::ofstream outFile(outFileStr);
std::string currLine;
while (getline(inFile, currLine)) {
if (currLine.size() == 1 ||
currLine.compare(currLine.length()-5, 5, "thing") != 0) {
outFile << currLine << '\n';
}
else {
for (int i = 0; i < 3; i++) {
getline(inFile, currLine);
}
}
}
inFile.close();
outFile.close();
}
// Processes n file concurrently, working way through
// all OBJ in objs
void multi_file_proc(std::vector<OBJ> objs, int n) {
std::vector<std::thread> procVec;
for (int i = 0; i < objs.size(); i++) {
/*
Ensure that n files are always being processed.
Upon completion of one, initiate another, until
all OBJ in objs have had their text files changed.
*/
}
}
I want to loop through each OBJ and write altered versions of their text files in concurrence, the limitation on simultaneous file read/writes being the thread value (n). Ultimately, all the objects' text files must be changed, but in such a way that there are always n files being processed, to maximize efficiency in concurrence.
Note the vector of threads, procVec. I originally approached this by managing a vector of threads, with a file being processed for each thread in procVec. From my reading, it seems a vector for managing these tasks is logical. But how do I always ensure there are n files open until all have been processed, without exiting with an open thread?
Edit: Apologies, my intention was not to ask others to write code for me. I just didn't want my approach to bias anyone's answer if the approach was bad to begin with.
These are some things I've tried (this code would go into the block comment in my function):
1. First approach. Idea is to add to procVec up until the thread limit n was reached, then join, remove a process from the front of the vector upon its completion. This is a summary of several similar iterations, none of which worked:
if (i >= n) {
procVec.front().join();
procVec.erase(procVec.begin());
}
procVec.push_back(std::thread(proc_file, sra[i]));
Problems with this:
- Incorrectly assumes front of vector will always finish first
- (Possibly?) Invalidates all iterators in procVec after first is erased
2. Using mutexes, I attempt writing a lambda function where the thread would be removed upon its completion. This is my current approach. Unsure why it isn't working, or if it even suits my needs:
// remThread() and lamb() defined above main function, **procVec** and **threadMutex**
//are global variables
void remThread(std::thread::id id) {
std::lock_guard<std::mutex lock(threadMutex);
auto iter = std::find_if(procVec.begin(), procVec.end(), [=](std::thread &t)
{return (t.get_id() == id); });
if (iter != procVec.end()) {
iter->join();
procVec.erase(iter);
}
}
void lamb(SRA sra, std::thread::id id) {
proc_file(sra);
remThread(id);
}
// This is the code contained in the main for loop. called lambda to process file
// and then remove thread
std::lock_guard<std::mutex> lock(threadMutex);
procVec.push_back(std::thread([sras, i]() {
std::thread(lamb, sras[i], std::this_thread::get_id()).detach();
}));
Problems with this:
- Program terminates, likely a joinable thread is active, leaves scope