I have a 2 dimensional numpy array, and I would like each element to be rounded to the closest number in a sequence. The array has shape (28000, 24)
.
The sequence, for instance, would be [0, 0.05, 0.2, 0.33, 0.5]
.
E.g. an original 0.27
would be rounded to 0.33
, and 0.42
would be rounded to 0.5
This is what I use so far, but it is of course really slow with a double loop.
MWE:
arr = np.array([[0.14, 0.18], [0.20, 0.27]])
new = []
sequence = np.array([0, 0.05, 0.2, 0.33, 0.5])
for i in range(len(arr)):
row = []
for j in range(len(arr[0])):
temp = (arr[i][j] - sequence)**2
row.append(list(sequence[np.where(temp == min(temp))])[0])
new.append(row)
Result:
[[0.2000001, 0.2000001], [0.2000001, 0.33000001]]
Motivation:
In machine learning, I am making predictions. Since the outcomes are reflections of confidence by experts, it could be that 2/3 gave a 1 (thus 0.66). So, in this data, relatively many 0, 0.1, 0.2, 0.33, 0.66, 0.75 etc. would occur. My predictions are however something like 0.1724. I would remove a lot of prediction error by rounding in this case to 0.2.
How to optimize rounding all elements?
Update: I now pre-allocated memory, so there doesn't have to be constant appending.
# new = [[0]*len(arr[0])] * len(arr), then unloading into new[i][j],
# instead of appending
Timings:
Original problem: 36.62 seconds
Pre-allocated array: 15.52 seconds
shx2 SOLUTION 1 (extra dimension): 0.47 seconds
shx2 SOLUTION 2 (better for big arrays): 4.39 seconds
Jaime's np.digitize: 0.02 seconds