Given the following array:
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
[[1 2 3]
[4 5 6]
[7 8 9]]
How can I replace certain values with other values?
bad_vals = [4, 2, 6]
update_vals = [11, 1, 8]
I currently use:
for idx, v in enumerate(bad_vals):
a[a==v] = update_vals[idx]
Which gives:
[[ 1 1 3]
[11 5 8]
[ 7 8 9]]
But it is rather slow for large arrays with many values to be replaced. Is there any good alternative?
The input array can be changed to anything (list
of list
/tuples
) if this might be necessary to access certain speedy black magic.
EDIT:
Based on the great answers from @Divakar and @charlysotelo did a quick comparison for my real use-case date using the benchit
package. My input data array has more or less a of ratio 100:1 (rows
:columns
) where the length of array of replacement values are in order of 3 x rows
size.
Functions:
# current approach
def enumerate_values(a, bad_vals, update_vals):
for idx, v in enumerate(bad_vals):
a[a==v] = update_vals[idx]
return a
# provided solution @Divakar
def map_values(a, bad_vals, update_vals):
N = max(a.max(), max(bad_vals))+1
mapar = np.empty(N, dtype=int)
mapar[a] = a
mapar[bad_vals] = update_vals
out = mapar[a]
return out
# provided solution @charlysotelo
def vectorize_values(a, bad_vals, update_vals):
bad_to_good_map = {}
for idx, bad_val in enumerate(bad_vals):
bad_to_good_map[bad_val] = update_vals[idx]
f = np.vectorize(lambda x: (bad_to_good_map[x] if x in bad_to_good_map else x))
a = f(a)
return a
# define benchit input functions
import benchit
funcs = [enumerate_values, map_values, vectorize_values]
# define benchit input variables to bench against
in_ = {
n: (
np.random.randint(0,n*10,(n,int(n * 0.01))), # array
np.random.choice(n*10, n*3,replace=False), # bad_vals
np.random.choice(n*10, n*3) # update_vals
)
for n in [300, 1000, 3000, 10000, 30000]
}
# do the bench
# btw: timing of bad approaches (my own function here) take time
t = benchit.timings(funcs, in_, multivar=True, input_name='Len')
t.plot(logx=True, grid=False)