I am trying to sort a large number of arrays in python. I need to perform the sorting for over 11 million arrays at once.
Also, it would be nice if I could directly get the indices that would sort the array.
That is why, as of now I'm using numpy.argsort() but thats too slow on my machine (takes over an hour to run)
The same operation in R is taking about 15 minutes in the same machine.
Can anyone tell me a faster way to do this in Python?
Thanks
EDIT:
Adding an example
If I have the following dataframe :
agg:
x y w z
1 2 2 5
1 2 6 7
3 4 3 3
5 4 7 8
3 4 2 5
5 9 9 9
I am running the following function and command on it:
def fucntion(group):
z = group['z'].values
w = group['w'].values
func = w[np.argsort(z)[::-1]][:7] #i need top 7 in case there are many
return np.array_str(func)[1:-1]
output = agg.groupby(['x,'y']).apply(function).reset_index()
so my output dataframe will look like this:
output:
x y w
1 2 6,2
3 4 2,3
5 4 7
5 9 9