Lets say we do some processing on an [n][m] matrix.
Is there a way to determine the ideal number of subtasks for parallel processing?
For example, given this [n][m] matrix, I can create n*m threads, n threads, m threads or some sized block... How would one know which is most efficient?