If you have 4 cores, then in theory you should have exactly four worker threads going at any given time for maximum optimization. Unfortunately what happens in theory never happens in practice. You may have worker threads who, for whatever reason, have significant amounts of downtime. Perhaps they're hitting the web for more data, or reading from a disk, much of which is just waiting and not utilizing the cpu.
Depending on how much waiting you're doing, you'll want to bump up the number. The costs to increasing the number of threads is that you'll have more context switching and competition for resources. The benefits are that you'll have another thread ready to work in case one of the other threads decides it has to break for something.
Your best bet is to set it to something (let's start with 4) and work your way up. Profile your code with each setting, and see if your benchmarks go up or down. You should soon see a pattern and a break-even point.
When it comes to optimization, you can theorize all you want about what should be the fastest, but you won't beat actually running and timing your code to truly answer this question.