Transform a dataframe for network graphing

Question

I have a dataframe like so:

ID  | Node 1 | Node 2 | Node 3
a   |   1    |    0   |   1
b   |   0    |    1   |   1
c   |   1    |    0   |   0
d   |   1    |    1   |   1
e   |   0    |    1   |   1

I want to change it so that I can turn it into a network chart, where connections between nodes are the amount of times an ID is indicated for both of them:

Node A | Node B | Weight |
Node 1 | Node 2 |    1   |
Node 1 | Node 3 |    2   |
Node 2 | Node 3 |    3   |

Is this only for three columns? Or are you intending to use it for more? — Alex, Mar 03 '18 at 01:06
For more - ideally it would work with a dynamic number of columns (up to 60) — NBC, Mar 03 '18 at 01:08

unutbu · Accepted Answer · 2018-03-03T02:49:59.610

Building on Tai's solution, you could obtain the desired DataFrame using

import numpy as np
import pandas as pd

def get_weights(df):
    df2 = df.filter(regex='Node')
    nodes = df2.columns
    arr = df2.values
    m = np.dot(arr.T, arr).astype(float)
    idx = np.tril_indices(m.shape[0])   
    m[idx] = np.nan
    result = pd.DataFrame(m, columns=nodes, index=nodes)
    result = result.stack()
    result = result.astype(int)
    result = result.reset_index()
    result.columns = ['Node A', 'Node B', 'Weights']
    return result

df = pd.DataFrame({'ID': ['a', 'b', 'c', 'd', 'e'],
 'Node 1': [1, 0, 1, 0, 0],
 'Node 2': [0, 1, 0, 1, 1],
 'Node 3': [1, 1, 0, 1, 1]})
result = get_weights(df)
print(result)

which yields

   Node A  Node B  Weight
0  Node 1  Node 2       1
1  Node 1  Node 3       2
2  Node 2  Node 3       3

Tai · Answer 2 · 2018-03-03T02:19:09.737

Instead of having a edge-list form

Node A | Node B | Weight |
Node 1 | Node 2 |    1   |
Node 1 | Node 3 |    2   |
Node 2 | Node 3 |    3   |

you can also calculate a co-occurance/adjancency matrix to represent the relationship you are interested. It can be constructed using a dot product. alko's already gave an answer in pandas in Constructing a co-occurrence matrix in python pandas

I modify alko's answer using numpy

m = df.values.T.dot(df.values)
np.fill_diagonal(m, 0)

# array([[0, 1, 2],
#       [1, 0, 3],
#       [2, 3, 0]])
# You can use nx.from_numpy_matrix to construct a graph
# m[i, j] is the number of co-occurance between node i and node j.

One part that I am not fond of alko's answer is that it tries to change the diagonal part of an dataframe, say df, by changing df.values. Changing df.values directly to change df should not be promoted because sometimes df.values returns a copy while sometimes a view. See my previous question Will changes in DataFrame.values always modify the values in the data frame? for more information.

If one want to follow alko's pandas method, one can replace np.fill_diagonal(df.values, 0) with

df = df - np.eye(len(df)) * np.diagonal(df)

score 1 · Answer 3 · answered Mar 05 '18 at 16:30

Dataframe to an adjacency matrix

You can iterate through your dataframe to create a numpy array:

import pandas as pd
import numpy as np
from itertools import combinations
import networkx as nx

df = pd.DataFrame({'node_1': [1,0,1,1,0], 
                   'node_2':[0,1,0,1,1], 
                   'node_3':[1,1,0,1,1]})

# Array dimension
l = len(df.columns)
# empty matrice
mat = np.zeros((l,l))

for i, row in df.iterrows():
    positions = np.where(row)[0]
    if len(positions)>1:
        for comb in combinations(positions,2):
            i,j = comb
            mat[i,j] += 1
            mat[j,i] += 1

mat

array([[ 0., 1., 2.], [ 1., 0., 3.], [ 2., 3., 0.]])

Networkx graph from numpy adjency matrix

G = nx.Graph(mat)
G.edges(data=True)

[out]: EdgeDataView([(0, 1, {'weight': 1.0}), (0, 2, {'weight': 2.0}), (1, 2, {'weight': 3.0})])

score 0 · Answer 4 · answered Mar 03 '18 at 02:23

You can first use itertools to find all the combinations, then find the weight for each pair.

import itertools
(
     pd.DataFrame(list(itertools.combinations(df.set_index('ID').columns,2)), 
                  columns=['Node A', 'Node B'])
     .assign(Weight=lambda y: y.apply(lambda x: df[[x['Node A'],x['Node B']]]
                                                .all(1).sum(), axis=1))
)

Out[39]: 
   Node A  Node B  Weight
0  Node 1  Node 2       1
1  Node 1  Node 3       2
2  Node 2  Node 3       3

Transform a dataframe for network graphing

4 Answers4

Dataframe to an adjacency matrix

Networkx graph from numpy adjency matrix