4

I have a CSV file that represents the adjacency matrix of a graph. However the file has as the first row the labels of the nodes and as the first column also the labels of the nodes. How can I read this file into a networkx graph object? Is there a neat pythonic way to do it without hacking around?

My trial so far:

x = np.loadtxt('file.mtx', delimiter='\t', dtype=np.str)
row_headers = x[0,:]
col_headers = x[:,0]
A = x[1:, 1:]
A = np.array(A, dtype='int')

But of course this doesn't solve the problem since I need the labels for the nodes in the graph creation.

Example of the data:

Attribute,A,B,C
A,0,1,1
B,1,0,0
C,1,0,0

A Tab is the delimiter, not a comma tho.

Jack Twain
  • 6,273
  • 15
  • 67
  • 107
  • So these labels are duplicated in the first row and column so are redundant? You could just use pandas which will use the labels as column names and then build the graph – EdChum Jul 15 '14 at 10:46
  • Can you post some data also – EdChum Jul 15 '14 at 10:47
  • does this help? https://stackoverflow.com/questions/15009615/extract-column-from-csv-file-to-use-as-nodelist-in-networkx – Back2Basics Jul 15 '14 at 10:58

2 Answers2

4

You could read the data into a structured array. The labels can be obtained from x.dtype.names, and then the networkx graph can be generated using nx.from_numpy_matrix:

import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

# read the first line to determine the number of columns
with open('file.mtx', 'rb') as f:
    ncols = len(next(f).split('\t'))

x = np.genfromtxt('file.mtx', delimiter='\t', dtype=None, names=True,
                  usecols=range(1,ncols) # skip the first column
                  )
labels = x.dtype.names

# y is a view of x, so it will not require much additional memory
y = x.view(dtype=('int', len(x.dtype)))

G = nx.from_numpy_matrix(y)
G = nx.relabel_nodes(G, dict(zip(range(ncols-1), labels)))

print(G.edges(data=True))
# [('A', 'C', {'weight': 1}), ('A', 'B', {'weight': 1})]

The nx.from_numpy_matrix has a create_using parameter you can use to specify the type of networkx Graph you wish to create. For example,

G = nx.from_numpy_matrix(y, create_using=nx.DiGraph())

makes G a DiGraph.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
2

This would work, not sure it is the best way:

In [23]:

import pandas as pd
import io
import networkx as nx
temp = """Attribute,A,B,C
A,0,1,1
B,1,0,0
C,1,0,0"""
# for your case just load the csv like you would do, use sep='\t'
df = pd.read_csv(io.StringIO(temp))
df
Out[23]:
  Attribute  A  B  C
0         A  0  1  1
1         B  1  0  0
2         C  1  0  0

In [39]:

G = nx.DiGraph()
for col in df:
    for x in list(df.loc[df[col] == 1,'Attribute']):
        G.add_edge(col,x)

G.edges()
Out[39]:
[('C', 'A'), ('B', 'A'), ('A', 'C'), ('A', 'B')]

In [40]:

nx.draw(G)

enter image description here

EdChum
  • 376,765
  • 198
  • 813
  • 562