Adjacency matrix with numpy outputting some incorrect values

Asked Sep 25 '18 at 23:47

Active Sep 26 '18 at 02:42

Viewed 31 times

I have a 6000 row dataframe that looks like this:

    index name  title       appearance
    0     John  Article 1   1.0
    1     John  Article 3   1.0
    2     Jane  Article 1   1.0
    3     Jane  Article 2   1.0
    4     Sarah Article 2   1.0

I've created an adjacency matrix by taking the cross product of the dataframe:

covar_df = pd.DataFrame(columns = df.name.unique(), index = df.title.unique())
covar_df = covar_df.fillna(0)

for index, row in df.iterrows():
    person = df.loc[index, 'name']
    appearance = df.loc[index, 'appearance']
    covar_df.loc[df.loc[index, 'title'], person] += appearance

adjacency_df = pd.DataFrame(np.dot(covar_df.T, covar_df), index = df.name.unique(), columns = df.name.unique())

Most of the nodes in the adjacency matrix are correct, but are not. For instance, using the real data, if I input:

[In]: covar_df['John'].sum()
[Out]: 626

But the node where John intersects with John in the adjacency matrix is 630.

I'm hesitant to share the dataset itself so I'm wondering if there is something about my code generally that could be throwing this off?

edited Sep 26 '18 at 02:42

asked Sep 25 '18 at 23:47

snapcrack

1,761
3
20
40

Do this: `df.pivot_table(columns='name', index='title', values='appearance', fill_value=0)` – cs95 Sep 25 '18 at 23:56
It's still giving me incorrect values – snapcrack Sep 26 '18 at 00:20

Adjacency matrix with numpy outputting some incorrect values

0 Answers0