2

I have a dataframe which is in this format

  from to weight
0    A  D 3
1    B  A 5
2    C  E 6
3    A  C 2

I wish to convert this to a correlation-type dataframe which would look like this -

  A B C D E
A 0 0 2 0 3
B 5 0 0 0 0
C 0 0 0 0 6
D 0 0 0 0 0
E 0 0 0 0 0

I thought a possible (read naïve) solution would be to loop over the dataframe and then assign the values to the correct cells of another dataframe by comparing the rows and columns.

Something similar to this:

new_df = pd.DataFrame(columns = sorted(set(df["from"])), index =sorted(set(df["from"])))

for i in range(len(df)):
    cor.loc[df.iloc[i,0], df.iloc[i,1]] = df.iloc[i,2]

And that worked. However, I've read about not looping over Pandas dataframes here.

The primary issue is that my dataframe is larger than this - a couple thousand rows. So I wish to know if there's another solution to this since this method doesn't sit well with me in terms of being Pythonic. Possibly faster as well, since speed is a concern.

Abhishek
  • 553
  • 2
  • 9
  • 26

1 Answers1

1

IIUC, this is a pivot with reindex:

(df.pivot(index='from', columns='to', values='weight')
 .reindex(all_vals)
 .reindex(all_vals,axis=1)
 .fillna(0)
)

Output:

to      A    B    C    D    E
from                         
A     0.0  0.0  2.0  3.0  0.0
B     5.0  0.0  0.0  0.0  0.0
C     0.0  0.0  0.0  0.0  6.0
D     0.0  0.0  0.0  0.0  0.0
E     0.0  0.0  0.0  0.0  0.0
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74