0

I'm trying to make a network graph using networkX that is given the nodes and attributes. Each node is unique but it can have matching attributes with other nodes. These attributes will act as the edges between the nodes that all have this same attribute.

An example of the input (node and attributes)

Name1   2-s2.0-84905590088, 2-s2.0-84901477890
Name2   2-s2.0-84941169876
Name3   2-s2.0-84958012773
Name4   2-s2.0-84960796474
Name5   2-s2.0-84945302996, 2-s2.0-84953281823, 2-s2.0-84944268402, 2-s2.0-84949478621, 2-s2.0-84947281259, 2-s2.0-84947759580, 2-s2.0-84945265895, 2-s2.0-84945247800, 2-s2.0-84946541351, 2-s2.0-84946051072, 2-s2.0-84942573284, 2-s2.0-84942280140, 2-s2.0-84937715425, 2-s2.0-84943751990, 2-s2.0-84957729558, 2-s2.0-84938844501, 2-s2.0-84934761065
Name6   2-s2.0-84908333808
Name7   2-s2.0-84925879816
Name8   2-s2.0-84940447040, 2-s2.0-84949534001
Name9   2-s2.0-84899915556, 2-s2.0-84922392381, 2-s2.0-84905079505, 2-s2.0-84940931972, 2-s2.0-84893682063, 2-s2.0-84954285577, 2-s2.0-84934934228, 2-s2.0-84926624187
Name10  2-s2.0-84907065810

so Name5 would have a lot of edges that connected up to the other names with the same identifier.

I'm not sure if this is the right idea behind networkX or if you can even use this kind of input to graph. If this way is not achievable, how would I format the input to make this graph? I haven't been able to find any documentation or videos on using networkX this way.

Sharw
  • 81
  • 2
  • 8
  • I don't think there is any built-in way to do that with NetworkX, but you can certainly do it by creating all the nodes and then iterating over the "attributes" and adding the appropriate edges. – BrenBarn Dec 20 '16 at 07:21

1 Answers1

1

What you ask is possible. I stored your data in a csv file -- note that I added a , after the node names and that I removed all whitespace.

Name1,2-s2.0-84905590088,2-s2.0-84901477890
Name2,2-s2.0-84941169876
Name3,2-s2.0-84958012773
Name4,2-s2.0-84960796474
Name5,2-s2.0-84945302996,2-s2.0-84953281823,2-s2.0-84944268402,2-s2.0-84949478621,2-s2.0-84947281259,2-s2.0-84947759580,2-s2.0-84945265895,2-s2.0-84945247800,2-s2.0-84946541351,2-s2.0-84946051072,2-s2.0-84942573284,2-s2.0-84942280140,2-s2.0-84937715425,2-s2.0-84943751990,2-s2.0-84957729558,2-s2.0-84938844501,2-s2.0-84934761065
Name6,2-s2.0-84908333808
Name7,2-s2.0-84925879816
Name8,2-s2.0-84940447040,2-s2.0-84949534001
Name9,2-s2.0-84899915556,2-s2.0-84922392381,2-s2.0-84905079505,2-s2.0-84940931972,2-s2.0-84893682063,2-s2.0-84954285577,2-s2.0-84934934228,2-s2.0-84926624187
Name10,2-s2.0-84907065810

One observation: you say that Name5 would have a lot of edges but its attributes are unique. Moreover, when I run my code with your data it turns out all of the attributes are unique so there are no edges in the graph.

I tweeked your data so that I use only the first 12 characters of each attribute (I do that with the line new_attributes = [x[:12] for x in new_attributes]). That way I get some matching attributes.

Now the code:

import networkx as nx
import csv

G = nx.Graph()

with open('data.csv') as csvfile:
        csv_reader = csv.reader(csvfile, delimiter=',')
        for row in csv_reader:

            new_node = row[0]  # first element in row
            new_attributes = row[1:]  # whole row except the first element
            new_attributes = [x[:12] for x in new_attributes]  # remove this for your data!
            # add the node and its attributes to the graph
            G.add_node(new_node, my_attributes=new_attributes)  # attributes are stored as a list

            # add edges based on existing nodes
            for node, attrs in G.nodes(data=True):
                # skip node we just added
                if node != new_node:
                    for attr in attrs['my_attributes']:
                        # check if any of the attributes for `node` are also in the `new_attributes` list
                        if attr in new_attributes:
                            G.add_edge(node, new_node)

for edge in G.edges():
    print('EDGE:', edge, '| COMMON ATTRIBUTES:', set(G.node[edge[0]]['my_attributes']) & set(G.node[edge[1]]['my_attributes']))

For each csv row I add a node (with its attributes) to the graph and based on the current nodes in the graph (and their attributes) I add the edges. Note that the node attributes are stored in a list and can be accessed with the my_attributes key. In the end I also print the edges with the matching attributes for the nodes in a particular edge (I use set and & to get the intersection of two lists of attributes).

Output for the tweeked data:

EDGE: ('Name5', 'Name9') | COMMON ATTRIBUTES: {'2-s2.0-84934'}
EDGE: ('Name5', 'Name8') | COMMON ATTRIBUTES: {'2-s2.0-84949'}
EDGE: ('Name8', 'Name9') | COMMON ATTRIBUTES: {'2-s2.0-84940'}
EDGE: ('Name1', 'Name9') | COMMON ATTRIBUTES: {'2-s2.0-84905'}

One final note: if you need to have multiple edges between two nodes use a MultiGraph.

edo
  • 1,712
  • 1
  • 18
  • 19
  • Thank you, Just a question, so the real data has last name, first initial e.g. `Smith, J`. Would this cause a problem with the separating of nodes and 'attributes' – Sharw Dec 20 '16 at 20:48
  • You can choose another delimiter for the csv file or just put the node name in double quotes (e.g. `"Smith, J"`). – edo Dec 20 '16 at 21:34
  • Ok sweet, how easy would it be to input this result into gephi to get a visualisation? – Sharw Dec 20 '16 at 23:00
  • You could save your graph locally in some standard format and then open it in Gephi. [Here](http://stackoverflow.com/a/15455966/6696049) is an example with code. – edo Dec 21 '16 at 09:52
  • Why did you trim the data to just the first part? Is it just for convenience in the output? You could do that in the print statement rather than lose precision in the data underneath. (e.g. also splitting on '-') I can envisage this to come back and bite... – Bonlenfum Dec 21 '16 at 16:35
  • @Bonlenfum I trimmed the data to get some matching attributes, i.e. to get some output. If I don't trim the data I get a graph with no edges (all attributes are unique in the csv file). OP should of course use the full attributes (I state that in a comment in the code). – edo Dec 21 '16 at 16:43