5

I am trying to create a graph using networkx and so far I have created nodes from the following text files : File 1(user_id.txt) sample data :

user_000001
user_000002
user_000003
user_000004
user_000005
user_000006
user_000007

File 2(user_country.txt) sample data : contains few blank lines too in case if user didn't enter his country details

 Japan
 Peru
 United States

 Bulgaria
 Russian Federation
 United States

File 3(user_agegroup.txt) data : contains four age groups

 [12-18],[19-25],[26-32],[33-39]

I have other two files with following sample data for adding edges in the graph

File 4(id,agegroup.txt)

user_000001,[19-25]
user_000002,[19-25]
user_000003,[33-39]
user_000004,[19-25]
user_000005,[19-25]
user_000006,[19-25]
user_000007,[26-32]

File 5(id,country.txt)

(user_000001,Japan)
(user_000002,Peru)
(user_000003,United States)
(user_000004,)
(user_000005,Bulgaria)
(user_000006,Russian Federation)
(user_000007,United States)

So far I have written following code to draw graphs with only nodes: (Please check the code because print g.number_of_nodes() never prints correct no. of nodes though print g.nodes()shows correct no. of nodes.)

import csv
import networkx as nx
import matplotlib.pyplot as plt
g=nx.Graph()

#extract and add AGE_GROUP nodes in graph
f1 = csv.reader(open("user_agegroup.txt","rb"))
for row in f1: 
    g.add_nodes_from(row)
    nx.draw_circular(g,node_color='blue')

#extract and add COUNTRY nodes in graph
f2 = csv.reader(open('user_country.txt','rb'))
for row in f2:
    g.add_nodes_from(row) 
    nx.draw_circular(g,node_color='red')

#extract and add USER_ID nodes in graph
f3 = csv.reader(open('user_id.txt','rb'))
for row in f3:
    g.add_nodes_from(row)
    nx.draw_random(g,node_color='yellow')

print g.nodes()
plt.savefig("path.png")
print g.number_of_nodes()
plt.show()

Besides this I can't figure out how to add edges from file4 and file5. Any help with code for that is appreciated. Thanks.

VivekP20
  • 65
  • 1
  • 2
  • 7
  • What are the values that appear for g.nodes and g.number_of_nodes and what did you expect ? – Abdallah Sobehy Oct 30 '15 at 09:39
  • I get 160 as no. of nodes rather than 259 which is the actual no.of nodes in three files(file 1, 2 and 3) and g.node is printing nodes which when I counted turned out to be correct i.e 259. Again, any suggestions regarding code for creating edges? – VivekP20 Oct 30 '15 at 12:35
  • So, for the sample you provided, you expect to obtain 18 nodes ? – Abdallah Sobehy Oct 30 '15 at 15:26
  • 1
    As for adding edges, you can read row then use G.add_edge(row[0],row[1]) – Abdallah Sobehy Oct 30 '15 at 15:27
  • 1
    `g.number_of_nodes` just returns the length of `g.node` (internally, a dictionary), and g.nodes() also just returns `g.node`. So unless you are modifying the graph between checking `len(g.nodes())` and `g.number_of_nodes)` it is hard to see how these two will do anything different. Are all of the entries in these three files all unique? Any duplicates will correspond to the same node. ([dict docs](https://docs.python.org/2/tutorial/datastructures.html#dictionaries)) – Bonlenfum Oct 30 '15 at 17:08
  • @Bonlenfum Out of File 1, 2 and 3, File 2 surely has duplicates as I made it clear with the sample data I provided in my question. Though, your point is valid. However, @Abdallah's solution produces correct result for `g,number_of_nodes` and `g.number_of_edges` as well. – VivekP20 Oct 31 '15 at 15:23
  • Fair point, I should have noticed the duplicate countries. The issue that you have with weird country names might just be to do with special characters. See http://stackoverflow.com/a/844443 for how to read in utf8, for instance. Anyway, the solution provided is nice to see – Bonlenfum Nov 03 '15 at 11:55

2 Answers2

3

For simplification I made user ID's [1,2,3,4,5,6,7] in the user_id.txt and id,country.txt files. You have some problems in your code:

1- First you add some nodes to the graph (for instance from the user_id.txt file) then you draw it, then you add some other nodes to the graph from another file then you re-draw the whole graph again on the same figure. So, in the end you have many graph in one figure.

2- You used the draw_circular method to draw twice, that is why the blue nodes never appeared as they are overwritten by the 'red' nodes.

I have made some changes to your code to draw only one time in the end. And to draw nodes with the needed colors, I added an attribute called colors when adding nodes. Then I used this attribute to build a color map which I sent to draw_networkx function. Finally, adding edges was a bit tricky because of the empty field in the id,country.txt so I had to remove empty nodes before creating the graph. Here is the code and the figure that appears afterwards.

G=nx.Graph()

#extract and add AGE_GROUP nodes in graph
f1 = csv.reader(open("user_agegroup.txt","rb"))
for row in f1: 
    G.add_nodes_from(row, color = 'blue')

#extract and add COUNTRY nodes in graph
f2 = csv.reader(open('user_country.txt','rb'))
for row in f2:
    G.add_nodes_from(row, color = 'red') 

#extract and add USER_ID nodes in graph
f3 = csv.reader(open('user_id.txt','rb'))
for row in f3:
    G.add_nodes_from(row, color = 'yellow')

f4 = csv.reader(open('id,agegroup.txt','rb'))
for row in f4:
    if len(row) == 2 : # add an edge only if both values are provided
        G.add_edge(row[0],row[1])

f5 = csv.reader(open('id,country.txt','rb'))

for row in f5:
    if len(row) == 2 : # add an edge only if both values are provided
        G.add_edge(row[0],row[1])
# Remove empty nodes
for n in G.nodes():
    if n == '':
        G.remove_node(n)
# color nodes according to their color attribute
color_map = []
for n in G.nodes():
    color_map.append(G.node[n]['color'])
nx.draw_networkx(G, node_color = color_map, with_labels = True, node_size = 500)

plt.savefig("path.png")

plt.show()

enter image description here

Abdallah Sobehy
  • 2,881
  • 1
  • 15
  • 28
  • Thanks a lot @Abdallah Sobehy. This is very helpful. By adapting this code I am now able to add edges read from the file into graph. But I think there is a little syntax error in line : `color_map.append(G.node[n]['color']) `which is why control doesn't go past that `for` loop so no graphical output there. `print g.nodes` and `print g.edges` prints correct list of nodes and edges so that means code is correct until that error line. Can you please confirm if there wasn't any error in your code so I can go ahead and accept your answer? – VivekP20 Oct 31 '15 at 15:08
  • `draw_networkx` doesn't work either. I changed that to `draw_random` and it worked. Similarly, `draw_circular` works too. – VivekP20 Oct 31 '15 at 15:37
  • EDIT : `nx.draw_networkx(G, node_color = color_map, with_labels = True, node_size = 500)` doesn't work either. I changed that to `nx.draw_random(g)` and it worked. Similarly, `nx.draw_circular(g)` works too. But it draws default red colored nodes without labels...I am guessing something is wrong with `node_color = color_map, with_labels = True`. Please edit your answer if you happen to fix these issues. – VivekP20 Oct 31 '15 at 15:49
  • Before i posted the answer, I ran the code and it was fine. I can not verify now because I do not have my machine right now. Anyways, just a small check, I named the graph to **G** instead of **g**. Check that you use the graph name correctly and let me know – Abdallah Sobehy Oct 31 '15 at 17:31
  • 1
    Yeah, I took care of **G** in my code. I think I have fixed the problem. I dig into the whole dataset I ran the code on and found out few 'weird' country names. Also the whole thing was messed up in few rows. I deleted them. That fixed the issue of labels not showing up for nodes in graph. But I still have no idea why it throws an error when I add `with_labels = true` parameter in `nx.draw_networkx`. When, I removed this parameter and ran the code again. Surprisingly it worked anyway and labels also showed up. Ah, I'll just let it be like that. – VivekP20 Oct 31 '15 at 19:44
  • It is a bit weird yes, but if it is working for the moment you can continue with it. And if something happens you can always comment here or ask another question. – Abdallah Sobehy Oct 31 '15 at 20:41
0

You can use a for like:

for a,b in df_edges.iterrows():
    G.add_edges_from([(b['source'], b['target'])])