23

I have a dataframe of the form:

index  Name_A  Name_B
  0    Adam    Ben
  1    Chris   David
  2    Adam    Chris
  3    Ben     Chris

And I'd like to obtain the adjacency matrix for Name_A and Name_B, ie:

      Adam Ben Chris David
Adam   0    1    1     0
Ben    0    0    1     0
Chris  0    0    0     1
David  0    0    0     0

What is the most pythonic/scaleable way of tackling this?

EDIT: Also, I know that if the row Adam, Ben is in the dataset, then at some other point, Ben, Adam will also be in the dataset.

The Ref
  • 684
  • 2
  • 7
  • 20

1 Answers1

40

You can use crosstab and then reindex by union of column and index values:

df = pd.crosstab(df.Name_A, df.Name_B)
print (df)
Name_B  Ben  Chris  David
Name_A                   
Adam      1      1      0
Ben       0      1      0
Chris     0      0      1

df = pd.crosstab(df.Name_A, df.Name_B)
idx = df.columns.union(df.index)
df = df.reindex(index = idx, columns=idx, fill_value=0)
print (df)
       Adam  Ben  Chris  David
Adam      0    1      1      0
Ben       0    0      1      0
Chris     0    0      0      1
David     0    0      0      0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Hi @jezrael, I'm wondering why is it in your answer that the 3rd row, second column has a `0` rather than a `1`. I.e. the matrix should be symmetric. How could your working example be used to do that? I'm thinking about taking the upper triangular, transpose, and replace, but this is not very elegant – Sos Aug 21 '19 at 16:07
  • I believe because it's using an ordered matching, not unordered. – jxramos Apr 20 '20 at 21:50