I have an array of values, said v
, (e.g. v=[1,2,3,4,5,6,7,8,9,10]
) and an array of indexes, say g
(e.g. g=[0,0,0,0,1,1,1,1,2,2]
).
I know, for instance, how to take the first element of each group, in a very numpythonic way, doing:
import numpy as np
v=np.array([1,2,3,4,74,73,72,71,9,10])
g=np.array([0,0,0,0,1,1,1,1,2,2])
mask=np.concatenate(([True],np.diff(g)!=0))
v[mask]
returns:
array([1, 74, 9])
Is there any numpy
thonic way (avoiding explicit loops) to get the maximum of each subset?
Tests:
Since I received two good answers, one with the python map
and one with a numpy
routine, and I was searching the most performing, here some timing tests:
import numpy as np
import time
N=10000000
v=np.arange(N)
Nelemes_per_group=10
Ngroups=N/Nelemes_per_group
s=np.arange(Ngroups)
g=np.repeat(s,Nelemes_per_group)
start1=time.time()
r=np.maximum.reduceat(v, np.unique(g, return_index=True)[1])
end1=time.time()
print('END first method, T=',(end1-start1),'s')
start3=time.time()
np.array(list(map(np.max,np.split(v,np.where(np.diff(g)!=0)[0]+1))))
end3=time.time()
print('END second method, (map returns an iterable) T=',(end3-start3),'s')
As a result I get:
END first method, T= 1.6057236194610596 s
END second method, (map returns an iterable) T= 8.346540689468384 s
Interestingly, most of the slowdown of the map
method is due to the list()
call. If I do not try to reconvert my map
result to a list
( but I have to, because python3.x
returns an iterator: https://docs.python.org/3/library/functions.html#map )