You could try something like this, with a to_records
, that seems to be the fastest as you can see here:
First Option
import pandas as pd
import numpy as np
data = {'id':['i1','i2','i3','i4','i5'], 'c1':[3,2,4,1,4], 'c2':[4,2,5,5,5], 'c3':[4,5,3,3,3], 'c4':[5,1,2,2,2]}
df = pd.DataFrame(data)
print(df)
highest_rated_companies={row[1]:[df.columns[idx] for idx,val in enumerate(list(row)[2:],1) if val>=4] for row in df.to_records()}
Second Option
import pandas as pd
data = {'id':['i1','i2','i3','i4','i5'], 'c1':[3,2,4,1,4], 'c2':[4,2,5,5,5], 'c3':[4,5,3,3,3], 'c4':[5,1,2,2,2]}
df = pd.DataFrame(data)
print(df)
highest_rated_companies={row[0]:[df.columns[idx] for idx,val in enumerate(row[1:],1) if val>=4] for i, row in df.iterrows()}
print(highest_rated_companies)
Both outputs:
df:
id c1 c2 c3 c4
0 i1 3 4 4 5
1 i2 2 2 5 1
2 i3 4 5 3 2
3 i4 1 5 3 2
4 i5 4 5 3 2
highest_rated_companies:
{'i1': ['c2', 'c3', 'c4'], 'i2': ['c3'], 'i3': ['c1', 'c2'], 'i4': ['c2'], 'i5': ['c1', 'c2']}
Timestamps:
First Option:
0.0113047
seconds best case, when executed 100
times the script
1.2424291999999468
seconds best case, when executed 10000
times the script
Second Option
0.07292359999996734
seconds best case, when executed 100
times the script
7.821904700000005
seconds best case, when executed 10000
times the script
Edit:
Using dt.to_records()
, seem to be the fastest way, since I tested Ehsan's answer and I got when executed 10000
times the script, a timestamp of 50.64001639999992
seconds, and when executed 100
times the script, a timestamp of 0.5399872999998934
seconds. Even it's faster than the Second Option, the First Option keep being the fastest.