I can find the n largest values in each row of a numpy array (link) but doing so loses the column information which is what I want. Say I have some data:
import pandas as pd
import numpy as np
np.random.seed(42)
data = np.random.rand(5,5)
data = pd.DataFrame(data, columns = list('abcde'))
data
a b c d e
0 0.374540 0.950714 0.731994 0.598658 0.156019
1 0.155995 0.058084 0.866176 0.601115 0.708073
2 0.020584 0.969910 0.832443 0.212339 0.181825
3 0.183405 0.304242 0.524756 0.431945 0.291229
4 0.611853 0.139494 0.292145 0.366362 0.456070
I want the names of the largest contributors in each row. So for n = 2
the output would be:
0 b c
1 c e
2 b c
3 c d
4 a e
I can do it by looping over the dataframe but that would be inefficient. Is there a more pythonic way?