I'm writing a program in which I'm trying to see how well a given redshift gets a set of lines detected in an spectrum to match up to an atomic line database. The closer the redshift gets the lines to overlap, the lower the "score" and the higher the chance that the redshift is correct.
I do this by looping over a range of possible redshifts, calculating the score for each. Within that outer loop, I was looping within each line in the set of detected lines to calculate its sub_score, and summing that inner loop to get the overall score.
I tried to vectorize the inner loop with numpy, but surprisingly it actually slowed down the execution. In the example given, the nested for loop takes ~2.6 seconds on my laptop to execute, while the single for loop with numpy on the inside takes ~5.3 seconds.
Why would vectorizing the inner loop slow things down? Is there a better way to do this that I'm missing?
import numpy as np
import time
def find_nearest_line(lines, energies):
# Return the indices (from the lines vector) that are closest to the energies vector supplied
# Vectorized with help from https://stackoverflow.com/a/53111759
energies = np.expand_dims(energies, axis=-1)
idx = np.abs(lines / energies - 1).argmin(axis=-1)
return idx
def calculate_distance_to_line(lines, energies):
# Returns distance between an array of lines and an array of energies)
z = (lines / energies) - 1
return z
rng = np.random.default_rng(2021)
det_lines = rng.random(1000)
atom_lines = rng.random(10000)
redshifts = np.linspace(-0.1, 0.1, 100)
# loop version
start=time.time()
scores = []
for redshift in redshifts:
atom_lines_shifted = atom_lines * (1 + redshift)
score = 0
for det_line in det_lines:
closest_atom_line = find_nearest_line(atom_lines_shifted, det_line)
score += abs(calculate_distance_to_line(atom_lines_shifted[closest_atom_line], det_line))
scores.append(score)
print(time.time()-start)
print(scores)
# (semi)-vectorized version
start=time.time()
scores = []
for redshift in redshifts:
atom_lines_shifted = atom_lines * (1 + redshift)
closest_atom_lines = find_nearest_line(atom_lines_shifted, det_lines)
score = np.sum(np.abs(calculate_distance_to_line(atom_lines_shifted[closest_atom_lines], det_lines)))
scores.append(score)
print(time.time()-start)
print(scores)