2

I have a regression problem to 3d-array data. The size of the array is (350, 350, 50) and I need to do the regression process to each pixel; for instance, do the regression to each (1, 1, 50) array then it is repeated in 350 x 350 times.

I made my code with Numpy and it is running in each procedure.

row, col, depth = image_sequence.shape 

for i in range(0, row):
    for j in range(0, col):
        Ytrain = image_sequence[i, j, :]
        new_stack[i,j,:] = regression_process(Ytrain)

'row' is 350
'col' is 350

In my inference, the computation time to each sequence takes 5sec. It means that as it should be computed to 350x350 sequences, it would be finished after around 7days.

I want to know how to optimize this process and finish it earlier.

I think that is related to some parallel processing, but I'm not used to it.

윤건우
  • 59
  • 5
  • Assuming `regression_process` is indeed a 5s-per-call bottleneck with no hope of improvement, then parallel processing it is. In which case, https://stackoverflow.com/questions/4682429/parfor-for-python is a duplicate but arguably does not give a straight answer. – Leporello Jul 04 '19 at 13:12
  • If your function `regression_process` takes 5sec and you need to call it 350x350 times, then what you need is to optimize this particular function. – klaus Jul 04 '19 at 17:48

1 Answers1

0

The numpy way is to code regression_process such that it takes arrays of (n, 50) as input, with n being any number. I give a simple example that just calculates the mean.

def regression_process(image): length = image.shape[1] new_stack = np.sum(image, axis=1) return new_stack / length

new_stack = regression_process(image_sequence.reshape(row*col, depth)) new_stack.reshape(row, col, depth)

rohrdr
  • 23
  • 4