Ok, so the architecture of the solution is going to depend on one thing: does the processing time per queue item vary according to the item's data?
If not then you can have something that merely round-robins between the processing threads. This will be fairly simple to implement.
If the processing time does vary then you're going to need something with more of a 'next available' feel to it, so that whichever of you threads happens to be free first gets given the job of processing the data item.
Having worked that out you're then going to have the usual run around with how to synchronise between a queue reader and the processing threads. The difference between 'next-available' and 'round-robin' is how you do that synchronisation.
I'm not overly familiar with C#, but I've heard tell of a beast called a background worker. That is likely to be an acceptable means of bringing this about.
For round robin, just start up a background worker per queue item, storing the workers' references in an array. Limit yourself to, say, 16 in progress background workers. The idea is that having started 16 you would then wait for the first to complete before starting the 17th, and so on. I believe that background workers actually run as jobs on the thread pool, so that will automatically limit the number of threads that are actually running at any one time to something appropriate for the underlying hardware. To wait for a background worker see this. Having waited for a background worker to complete you'd then handle its result and start another up.
For the next available approach its not so different. Instead of waiting for the 1st to complete you would use WaitAny() to wait for any of the workers to complete. You handle the return from whichever one completed, and then start another one up and go back to WaitAny().
The general philosophy of both approaches is to keep a number of threads on the boil all the time. A features of the next-available approach is that the order in which you emit the results is not necessarily the same as the order of the input items. If that matters then the round robin approach with more background workers than CPU cores will be reasonably efficient (the threadpool will just start commissioned but not yet running workers anyway). However the latency will vary with the processing time.
BTW 16 is an arbitrary number chosen on the basis of how many cores you think will be on the PC running the software. More cores, bigger number.
Of course, in the seemingly restless and ever changing world of .NET there may now be a better way of doing this.
Good luck!