Based on your edit, it sounds to me like there may be a solution that doesn't involve explicit threads at all. You may well be able to use OpenMP to execute the code in parallel without doing the threading explicitly at all. This could be about as simple as something like:
Function Something(){
#pragma omp parallel for // ...
for ( i = 0; i < 9000 ){
DoStuff();
}
}
Here, the ...
means you might need (or want) to add some more annotations there. For example, you can specify which variables will be shared, which will be independent for each thread, etc.
This may be a lot less glamorous than writing your own threading code, but it can be quite effective. In particular, the OpenMP run-time will normally have built-in code for determining how many threads to use based on available processor resources, so use of 10 threads wouldn't be explicit -- so in a couple of years when you have a machine with 16 cores, you wouldn't have to rewrite to take advantage of them.
At the same time, it's true that OpenMP has limitations. For the the situation you've described (executing loop iterations in parallel) it works quite nicely. It's not nearly as well suited to some other scenarios (e.g., creating an execution pipeline so one step of processing happens on one core, the next step on the next core, and so on).