1

Is there a standard structure for adding edges from a csv/txt into NetworkX? I've read the docs and have tried using read_edgelist('path.csv') and add_edges_from('path.csv') but have received errors saying my data cannot be converted into dictionaries, and also "Edge tuple C be a 2-tuple or a 3-tuple". I've reformatted a sample of my data several ways to test different structures including lists of lists and lists of tuples, removing white space and also creating a single list of numbers in each row, but no luck. Below is some sample data of mine:

user_id,cluster_moves
11011,"[[86, 110], [110, 110]]"
2139671,"[[89, 125]]"
3945641,"[[36, 73], [73, 110], [110, 110]]"
10024312,"[[123, 27], [27, 97], [97, 97], [97, 97], [97,110]]"
14270422,"[[0, 110], [110, 174]]"
14283758,"[[110, 184]]"
14373703,"[[35, 97], [97, 97], [97, 97], [97, 17], [17,58]]"

The purpose is to create a network graph of trajectories moving between (or within) clusters. Each list is a move either within a cluster, or between a cluster, e.g., [[0, 110], [110,174]] is a move from clusters 0->110->174. Is there a way to format my data such that networkx might be able to read it?

Quick sample code I was testing data with:

import networkx as nx
import matplotlib.pyplot as plt

g = nx.Graph()
edges = g.add_edges_from('path.csv')

nx.draw(g)
plt.draw
plt.show()

Edit

Is it possible to add edge weights to this data structure when reading in networkx, and then adjust the weight based on the count/frequency of an edge? I would like to do this so I can visualize edges that have a higher frequency/count as another color/line weight. Using the answer below, I have tried using g.add_weighted_edges_from() and using weight=1 as an attribute instead of using g.add_edges_from(), but this did not work properly. I also tried using this with no luck:

for u,v,d in g.edges():
    d['weight'] = 1
g.edges(data=True)
edges = g.edges()
weights = [g[u][v]['weight'] for u,v in edges]
andrewr
  • 784
  • 13
  • 31

1 Answers1

1

First of all, your data is not valid csv file, from Comma separated values

Fields with embedded commas or double-quote characters must be quoted.

Which means you should use double-quote to quote your list:

user_id,cluster_moves
11011,"[[86, 110], [110, 110]]"
2139671,"[[89, 125]]"
3945641,"[[36, 73], [73, 110], [110, 110]]"
10024312,"[[123, 27], [27, 97], [97, 97], [97, 97], [97,110]]"
14270422,"[[0, 110], [110, 174]]"
14283758,"[[110, 184]]"
14373703,"[[35, 97], [97, 97], [97, 97], [97, 17], [17,58]]"

And you can use csv module to read this file, and then convert the string to list by using eval() and create a network graph with add_edges_from:

import csv
import networkx as nx
import matplotlib.pyplot as plt

g = nx.Graph()
for row in csv.reader(open('ooo.csv', 'r')):
    if '[' in row[1]:       #
        g.add_edges_from(eval(row[1]))

nx.draw(g)
plt.draw
plt.show()

enter image description here

McGrady
  • 10,869
  • 13
  • 47
  • 69
  • Ah, I see what the issue was. I had just quick copied over from a df and it didn't have double quotes. Just tested on my csv, and it worked perfectly, thanks! – andrewr Apr 10 '17 at 15:29
  • McGrady, I added a quick edit to this to ask about adding weights to these edges. I had asked about this in a second question but have not received any answers as of yet (http://stackoverflow.com/questions/43529800/python-networkx-calculate-edge-weights-between-nodes-on-the-fly). I tried using `g.add_weighted_edges_from()` and using `weight=1` as an attribute but I have not had any luck with that. – andrewr Apr 25 '17 at 22:54