26

I have been battling with this problem for a little bit now, I know this is very simple - but I have little experience with Python or NetworkX. My question is very simple, I am trying to plot a large dataset (about 200 rows/columns) of a matrix that looks like this. The first row and first column are identical.

  A,B,C,D,E,F,G,H,I,J,K
A,0,1,1,0,1,1,1,1,0,1,0
B,1,0,0,0,1,1,1,1,0,1,0
C,1,0,0,0,1,1,1,1,0,1,0

It just a matrix showing how people are connected, and all I want is to import and plot this csv file, with it's corresponding labels in NetworkX.

I have this file (people.csv), and looking at previous answers here, it seems the best way to do this is by putting the data in an array with numpy.

There seems to be a problem with this:

import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from numpy import genfromtxt
import numpy as np

mydata = genfromtxt('mouse.csv', delimiter=',')

I get the following output:

File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/lib/npyio.py", line 1272, in genfromtxt
  fhd = iter(np.lib._datasource.open(fname, 'rbU'))
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/lib/_datasource.py", line 145, in open
  return ds.open(path, mode)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/lib/_datasource.py", line 472, in open
  found = self._findfile(path)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/lib/_datasource.py", line 323, in _findfile
  if self.exists(name):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/lib/_datasource.py", line 417, in exists
  from urllib2 import urlopen
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 94, in <module>
  import httplib
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 69, in <module>
  from array import array
      File "/Users/Plosslab/Documents/PythonStuff/array.py", line 4, in <module>
      NameError: name 'array' is not defined
Community
  • 1
  • 1
Workhorse
  • 429
  • 1
  • 5
  • 9

3 Answers3

30

I made a small csv called mycsv.csv that has the following:

,a,b,c,d
a,0,1,0,1
b,1,0,1,0
c,0,1,0,1
d,1,0,1,0

You don't have a ',' as the first character on the first row, but instead you have a space, so if this is an error on my part let me know. The general idea will be the same. Read in the csv as such:

from numpy import genfromtxt
import numpy as np
mydata = genfromtxt('mycsv.csv', delimiter=',')
print(mydata)
print(type(mydata))

This prints:

[[ nan  nan  nan  nan  nan]
 [ nan   0.   1.   0.   1.]
 [ nan   1.   0.   1.   0.]
 [ nan   0.   1.   0.   1.]
 [ nan   1.   0.   1.   0.]]
<type 'numpy.ndarray'>

Now that we have the csv read in as a numpy array we need to extract just the adjacency matrix:

adjacency = mydata[1:,1:]
print(adjacency)

This prints:

[[ 0.  1.  0.  1.]
 [ 1.  0.  1.  0.]
 [ 0.  1.  0.  1.]
 [ 1.  0.  1.  0.]]

You can just slice your numpy array as needed if my small example isn't exactly as yours.

To plot the graph you will need to import matplotlib and networkx:

import matplotlib.pyplot as plt
import networkx as nx

def show_graph_with_labels(adjacency_matrix, mylabels):
    rows, cols = np.where(adjacency_matrix == 1)
    edges = zip(rows.tolist(), cols.tolist())
    gr = nx.Graph()
    gr.add_edges_from(edges)
    nx.draw(gr, node_size=500, labels=mylabels, with_labels=True)
    plt.show()

show_graph_with_labels(adjacency, make_label_dict(get_labels('mycsv.csv')))

Here's a short tutorial on graphs with python.

graph from csv

Scott
  • 6,089
  • 4
  • 34
  • 51
  • 1
    This is very helpful, but it is imperative that the nodes are labelled, whereas genfromtxt seems to remove that part. – Workhorse Apr 11 '15 at 04:49
  • I think I misunderstood. Are the letters your labels? If you want to use something other than the row / col number as your labels you can add custom labels: https://networkx.github.io/documentation/latest/examples/drawing/labels_and_colors.html – Scott Apr 11 '15 at 05:02
  • I finally ran this code, I get a slew of errors: ` File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/lib/npyio.py", line 1272, in genfromtxt fhd = iter(np.lib._datasource.open(fname, 'rbU')) NameError: name 'array' is not defined` – Workhorse Apr 11 '15 at 17:19
  • 1
    Saying that you got errors is not a helpful comment. What errors did you get? And can you address my previous question regarding what labels you are expecting? – Scott Apr 11 '15 at 17:24
  • Ok I will post them into the question. – Workhorse Apr 11 '15 at 17:36
  • Yes, the letters are my labels. – Workhorse Apr 11 '15 at 17:39
  • Ok I fixed it, no network errors. I am going to take your recommendation to plot the actual labels (A,B,C,D etc...) into the graph. – Workhorse Apr 11 '15 at 18:08
  • I'll edit so you can add your own labels (let's assume they are the letters A, B, C..., from the csv) – Scott Apr 11 '15 at 18:53
  • @Workhorse Also if you feel this answers your question would you please upvote and accept my answer. – Scott Apr 11 '15 at 19:46
  • 2
    It seems that if there are no edges to some node, this node will not appear in the graph using this method. – Martin Becker Feb 04 '16 at 07:55
  • Where do `make_label_dict()` and `get_labels()` come from? Are they evident to everyone except myself? – Apostolos Apr 05 '23 at 12:03
19

This can be done easily by using pandas and networkx.

For example, I have created a small csv file called test.csv as

A,B,C,D,E,F,G,H,I,J,K
A,0,1,1,0,1,1,1,1,0,1,0
B,1,0,0,0,1,1,1,1,0,1,0
C,1,0,0,0,1,1,1,1,0,1,0
D,0,0,0,0,1,0,1,1,0,1,0
E,1,0,0,0,1,1,1,1,0,1,0
F,0,0,1,0,1,0,0,0,0,1,0
G,1,0,0,0,0,0,0,1,0,0,0
H,1,0,0,0,1,1,1,0,0,1,0
I,0,0,0,1,0,0,0,0,0,0,0
J,1,0,0,0,1,1,1,1,0,1,0
K,1,0,0,0,1,0,1,0,0,1,0

You can read this csv file and create graph as follows

import pandas as pd
import networkx as nx
input_data = pd.read_csv('test.csv', index_col=0)
G = nx.DiGraph(input_data.values)

For plotting this graph use

nx.draw(G)

You would be getting a plot something similar to this.

Output of <code>nx.draw(G)</code>

Abinash Panda
  • 454
  • 2
  • 7
  • Is it not necessary to start the 1st row with comma to indicate that the 1st cell is empty? – Sigur Aug 10 '17 at 16:14
2

This is identical to Scott's excellent answer but handles correctly nodes without edges.

import matplotlib.pyplot as plt
import networkx as nx

def show_graph_with_labels(adjacency_matrix, mylabels):
    rows, cols = np.where(adjacency_matrix == 1)
    edges = zip(rows.tolist(), cols.tolist())
    gr = nx.Graph()
    all_rows = range(0, adjacency_matrix.shape[0])
    for n in all_rows:
        gr.add_node(n)
    gr.add_edges_from(edges)
    nx.draw(gr, node_size=900, labels=mylabels, with_labels=True)
    plt.show()
dimid
  • 7,285
  • 1
  • 46
  • 85