As I said in my answer to your previous question, I think you don't understand the difference between processes and threads. Processes are incredibly "heavy" (*); each process can contain many threads. If you are spawning new processes from a parent process, that parent process doesn't need to create new threads; each process will have its own collection of threads.
Only create threads in the parent process if all the work is being done in the same process.
Think of a thread as a worker, and a process as a building containing one or more workers.
One strategy is "build a single building and populate it with ten workers who do each do some amount of work". You get the expense of building one process and ten threads.
If your strategy is "build a building. Then have the one worker in that building order the construction of a thousand more buildings, each of which contains a worker that does their bidding", then you get the expense of building 1001 buildings and hiring 1001 workers.
The strategy you do not want to pursue is "build a building. Hire 1000 workers in that building. Then instruct each worker to build a building, which then has one worker to go do the real work." There is no point in making a thread whose sole job is creating a process that then creates a thread! You have 1001 buildings and 2001 workers, half of whom are immediately idle but still have to be paid.
Looking at your specific problem: the key question is "where is the bottleneck?" Spawning off new processes or new threads only helps when the performance problem is that the perf is gated on the processor. If the performance of your parser is gated not on how fast you can parse the file but rather on how fast you can get it off disk, then parallelizing it is going to make things far, far worse. You'll have a huge amount of system resources devoted to all hammering on the same disk controller at the same time, and the disk controller will get slower as more load piles up on it.
UPDATE:
I need to limit the number of executions of the .exe to ONE execution PER CORE. This is the most efficient because if I am parsing 100,000 files I can't just fire up 100000 processes. So I am using threads to limit the number of executions at one time to one execution per core. If there is another way (other than threads) to find out if a processor isn't tied up in execution, or if the .exe has finished please explain
This seems like an awfully complicated way to go about it. Suppose you have n processors. Your proposed strategy, as I understand it, is to fire up n threads, then have each thread fire up one process, and you know that since the operating system will probably schedule one thread per CPU that somehow the processor will magically also schedule the new thread in each new process on a different CPU?
That seems like a tortuous chain of reasoning that depends on implementation details of the operating system. This is craziness. If you want to set the processor affinity of a particular process, just set the processor affinity on the process! Don't be doing this crazy thing with threads and hope that it works out.
I say that if you want to have no more than n instances of an executable running, one per processor, don't mess around with threads at all. Rather, just have one thread sit in a loop, constantly monitoring what processes are running. If there are fewer than n copies of the executable running, spawn another and set its processor affinity to be the CPU you like best. If there are n or more copies of the executable running, go to sleep for a second (or a minute, or whatever makes sense), and when you wake up, check again. Keep doing that until you're done. That seems like a much easier approach.
(*) Threads are also heavy, but they are lighter than processes.