I have a very large dataframe, a sample of which looks like this:
df = pd.DataFrame({'From':['a','b','c','a','d'], 'To':['b', 'c', 'a', 'd', 'e'], 'Rates':[1e-4, 2.3e-2, 1e-2, 100, 70]})
In[121]: df
Out[121]:
From To Rates
0 a b 0.0001
1 b c 0.0230
2 c a 0.0100
3 a d 100.0000
4 d e 70.0000
The end result I would like is a n x n
matrix model_matrix
(where n
= len(df)
) that looks like this:
a b c d e
a 0.00 0.0001 0.000 100.0 0.0
b 0.00 0.0000 0.023 0.0 0.0
c 0.01 0.0000 0.000 0.0 0.0
d 0.00 0.0000 0.000 0.0 70.0
e 0.00 0.0000 0.000 0.0 0.0
The following code works by converting df
to a dictionary transfer_rates
and filling in the model_matrix
with values from the dictionary, but it is very inefficient for a large df
.
from_comps = list(df['From'])
to_comps = list(df['To'])
transfer_rates = {}
for from_comp in from_comps:
for to_comp in to_comps:
try:
transfer_rates[from_comp, to_comp] = df.loc[(df['From'] == from_comp) & (df['To'] == to_comp)]['Rates'].values[0]
except:
pass
all_comps = sorted(set(from_comps+to_comps))
model_matrix = pd.DataFrame(columns=sorted(all_comps),index=sorted(all_comps))
for rate in transfer_rates:
model_matrix[rate[1]][rate[0]] = transfer_rates[rate]
model_matrix.fillna(0, inplace=True)
Is there a more efficient way of doing this?