You'll have to make a few things for it to work like expected. A pool of thread is a pool of waiting threads, waiting for a function and a parameter to exec. It also commonly have a waitlist of N element (adjustable) to stack the upcoming work. For the task you're doing you'll have to use as much threads as cores of your processor. More would not speed up the job.
Now to the code: you'll need a function taking a parameter which should contain all the datas needed for the function to work. Depending on how you'll manipulate the data you'll also need to use some locking system, be it with mutex locks, semaphores, whatever.
Before entering your for loop the thread pool should be allocated with cpu_cores threads and a waiting list as long as the maximum amount of function you want to pass to it, that or the add_work_to_thread_pool system should be blocking until some room is made by threads finishing their jobs.
Inside the for for loop you add function( parameter ) to the waiting list. The waiting list will be consumed by allocated_threads at a time.
After the for loop you have to wait that each thread is in a waiting state, and that the waiting list is empty to be sure all the job is done.
With the help of the python thread and wait list manual and some few google I think you can maybe now code it by yourself.
Else feel free to ask for some clarifications on specific points and then come back with code you tried to do and that is not working as expected. I mean code with threads. Not just the snippet you pasted.
Have a nice time, multi tasking is fun :-)