I like to least-squares match data (a numpy array of floats) with many known signal shapes. My code works but is too slow for the many runs I plan to do:
import numpy
import time
samples = 50000
width_signal = 100
data = numpy.random.normal(0, 1, samples)
signal = numpy.random.normal(0, 1, width_signal) # Placeholder
t0 = time.clock()
for i in range(samples - width_signal):
data_chunk = data[i:i + width_signal]
residuals = data_chunk - signal
squared_residuals = residuals**2
summed_residuals = numpy.sum(squared_residuals)
t1 = time.clock()
print('Time elapsed (sec)', t1-t0)
EDIT: Corrected a mistake: First square residuals, then sum them.
This takes about 0.2 sec to run on my machine. As I have many datasets and signal shapes, this is too slow. My specific problem does not allow for typical MCMC methods because the signal shapes are too different. It has to be brute force.
Typical volumes are 50,000 floats for the data and 100 for the signal. These can vary by a factor of a few.
My tests show that:
- The summing of the data
numpy.sum(residuals)
eats 90% of the time. I tried Python'ssum(residuals)
and it is faster for small arrays (~<50 elements) and slower for bigger arrays. Should I insert anif
condition? - I tried
numpy.roll()
instead of fetching data directly, and.roll()
is slower.
Questions:
- Is there a logical improvement for speed-up?
- Is there a faster way to sum arrays? I know no C, but if it is much faster I could try that.
- Can a GPU help? I have many runs to do. If so, where could I find a code snippet to do this?