Stop trying to code and draw out what you need to do and the pieces you need to have in order to do it.
You need one queue to hold the jobs, one mutex to protect the queue so the threads don't smurf it up with simultaneous accesses, and N threads.
Each thread function is a loop that
- grabs the mutex,
- gets a job from the queue,
- releases the mutex, and
- processes the job.
In this case I'd keep things simple by exiting the loop and the thread when there are no more jobs in the queue in step 2. In production you'd have the thread block and wait on the queue so it's still available to service jobs added later.
Wrap that up in a class with a function that allows you to add jobs to the queue, a function to start N threads, and a function to join on all of the running threads.
main
defines an instance of the class, feeds in the jobs, starts the thread pool and then blocks on join until everyone's done.
Once you've beaten the design into something you have high confidence does what you need it to do, then you start writing code. Write code, especially multi-threaded code, without a plan and you're in for a lot of debugging and re-writing that usually exceeds the time spent on design by a significant margin.