When using Python's Library Networkx's function write_adjlist (source code) I run into the following problem:
The output looks like this:
164021756 15579697
836289488
268525305
527465237 1514162604
460419343
317218275
397533608
37880000
39066509
1146692844
When it should look like this:
164021756 15579697 836289488 268525305
527465237 1514162604
460419343 317218275
397533608 37880000
39066509 1146692844
I can't really give you the data, because it's millions of nodes (which might be a factor here, although I don't think so) but this is basically how I'm getting there:
G = nx.DiGraph()
graph_file = open(filename, 'r')
for line in graph_file.readlines():
try:
x, y =line.replace('\n','').split(',')
except: print "didn't work"; continue;
G.add_edge(x,y)
G.add_edge(y,x)
#This is because it's undirected, but I need the relationships
to be presented on both nodes
nx.write_adjlist(G,outfilename)
graph_file is presented in the form userid1,userid2\n
This code worked fine for a 2k nodes graph and a 16k nodes graph.
The error might be due to the generate_adjlist function in the source code, but I'm not really sure. I appreciate all help and recommendations for other methods to create an adjacency list as well.
Specs: Ubuntu 14.04 64bit, 32GB of RAM, SSD, AMD FX(tm)-8350 Eight-Core Processor
EDIT: This is what graph_file looks like:
212127041,218628098
840686875,2278293507
1854227586,2278293507
2266167497,2278293507
2254676097,2278293507
2240955304,2278293507
2226709709,2278293507
1859242609,2278293507
341722764,2278293507
1270686055,2278293507
1049821634,2278293507
1003015644,2278293507
616403983,2278293507
556471190,2278293507
27260086,2278293507
714928003,2278293507
1270696736,2278293507
586671909,2278293507
34507480,2278293507