Create adjacency matrix for two columns in pandas dataframe

Question

I have a dataframe of the form:

index  Name_A  Name_B
  0    Adam    Ben
  1    Chris   David
  2    Adam    Chris
  3    Ben     Chris

And I'd like to obtain the adjacency matrix for Name_A and Name_B, ie:

      Adam Ben Chris David
Adam   0    1    1     0
Ben    0    0    1     0
Chris  0    0    0     1
David  0    0    0     0

What is the most pythonic/scaleable way of tackling this?

EDIT: Also, I know that if the row Adam, Ben is in the dataset, then at some other point, Ben, Adam will also be in the dataset.

jezrael · Accepted Answer · 2017-03-15T10:10:46.527

40

You can use crosstab and then reindex by union of column and index values:

df = pd.crosstab(df.Name_A, df.Name_B)
print (df)
Name_B  Ben  Chris  David
Name_A                   
Adam      1      1      0
Ben       0      1      0
Chris     0      0      1

df = pd.crosstab(df.Name_A, df.Name_B)
idx = df.columns.union(df.index)
df = df.reindex(index = idx, columns=idx, fill_value=0)
print (df)
       Adam  Ben  Chris  David
Adam      0    1      1      0
Ben       0    0      1      0
Chris     0    0      0      1
David     0    0      0      0

edited Mar 15 '17 at 10:10

answered Mar 15 '17 at 10:03

jezrael

822,522
95
1,334
1,252

1

Hi @jezrael, I'm wondering why is it in your answer that the 3rd row, second column has a `0` rather than a `1`. I.e. the matrix should be symmetric. How could your working example be used to do that? I'm thinking about taking the upper triangular, transpose, and replace, but this is not very elegant – Sos Aug 21 '19 at 16:07
I believe because it's using an ordered matching, not unordered. – jxramos Apr 20 '20 at 21:50

Create adjacency matrix for two columns in pandas dataframe

1 Answers1

Linked

Related