How to change a simple network dataframe to a correlation table?

Question

I have a dataframe which is in this format

  from to weight
0    A  D 3
1    B  A 5
2    C  E 6
3    A  C 2

I wish to convert this to a correlation-type dataframe which would look like this -

  A B C D E
A 0 0 2 0 3
B 5 0 0 0 0
C 0 0 0 0 6
D 0 0 0 0 0
E 0 0 0 0 0

I thought a possible (read naïve) solution would be to loop over the dataframe and then assign the values to the correct cells of another dataframe by comparing the rows and columns.

Something similar to this:

new_df = pd.DataFrame(columns = sorted(set(df["from"])), index =sorted(set(df["from"])))

for i in range(len(df)):
    cor.loc[df.iloc[i,0], df.iloc[i,1]] = df.iloc[i,2]

And that worked. However, I've read about not looping over Pandas dataframes here.

The primary issue is that my dataframe is larger than this - a couple thousand rows. So I wish to know if there's another solution to this since this method doesn't sit well with me in terms of being Pythonic. Possibly faster as well, since speed is a concern.

score 1 · Accepted Answer · answered Jul 22 '19 at 17:05

1

IIUC, this is a pivot with reindex:

(df.pivot(index='from', columns='to', values='weight')
 .reindex(all_vals)
 .reindex(all_vals,axis=1)
 .fillna(0)
)

Output:

to      A    B    C    D    E
from                         
A     0.0  0.0  2.0  3.0  0.0
B     5.0  0.0  0.0  0.0  0.0
C     0.0  0.0  0.0  0.0  6.0
D     0.0  0.0  0.0  0.0  0.0
E     0.0  0.0  0.0  0.0  0.0

answered Jul 22 '19 at 17:05

Quang Hoang

146,074
10
56
74

Thank you for your answer. What is `all_vals` here? – Abhishek Jul 22 '19 at 17:08
1

`df.pivot('from','to','weight').fillna(0)` should do the trick too to keep only the final values. – realr Jul 22 '19 at 17:10
1

@Abhishek `all_vals = ['A','B','C','D','E']` so as you can have all the columns/rows. – Quang Hoang Jul 22 '19 at 17:14
2

@filippo `df[['from', 'to']].stack().unique()` or just use `np.unique(df[['from', 'to']])` for performance – user3483203 Jul 22 '19 at 17:17

How to change a simple network dataframe to a correlation table?

1 Answers1