1

I am getting a KeyError when I try to create a network.

My dataset is

Node    Neighbors       Colour  Weight
 Luke   Alte            orange    3
 Luke   John            orange    3
Michael Laura           red       43
Ludo    Stella          orange   21
Alte    Ludo            blue     24
Alte    Luke            blue     24

The table above shows the links by nodes:

  • node Luke is linked with Alte and John. It has edge weight 3 and colour orange
  • node Michael is linked with Laura. It has weight 43 and colour red
  • node Ludo is linked with Stella. It has weight 21 and colour orange
  • node Alte is linked with Luke and Ludo. It has colour blue and weight 24

Doing as follows:

NROWS = None
def get_graph_from_pandas(df):
    
    G = nx.DiGraph() # assuming the graph is directed since e.g node 1 has 
                     # 3 as neighbour but 3 doesnt have 1 as neighbour
    
    
    for row in df.itertuples(): # row is the row of the dataframe
        n = row.Node
        w = row.Weight
        c = row.Colour
        neighbors = row.Neighbors
        
        G.add_node(n, weight = w, colour = c)
        
        for neigh in neighbors:
            #add edge weights here, attribute of G.add_edge
            G.add_edge(n,neigh)  
            
    return G
        
        
        
G = get_graph_from_pandas(df)

print("Done.")
print("Total number of nodes: ", G.number_of_nodes())
print("Total number of edges: ", G.number_of_edges())

pos = nx.draw(G, with_labels=True, 
              node_color=[node[1]['colour'] for node in G.nodes(data=True)], 
              node_size=200)

gives me a KeyError: 'colour'.

When I print

for node in G.nodes(data=True):     
     try:         
        node[1]['colour']     
     except KeyError:         
        print(node)

I get

('A', {}) 
('l', {}) 
('t', {}) 
('e', {})

Can you please explain what is causing the error? Thanks

Update: I think the error is from here

 for neigh in neighbors:
                #add edge weights here, attribute of G.add_edge
                G.add_edge(n,neigh)  
wwii
  • 23,232
  • 7
  • 37
  • 77
V_sqrt
  • 537
  • 8
  • 28
  • It's not just a case sensitivity issue is it? i.e. you have `Colour` in your example but your code uses `colour`. – Random Davis Dec 08 '20 at 19:25
  • I already tried with Colour but it gives the same error :( I updated the question to show the how nodes look like (as you can see it is completely wrong, as it should be Luke and not L-u-k-e-) – V_sqrt Dec 08 '20 at 19:26
  • When you [catch the error](https://docs.python.org/3/tutorial/errors.html#handling-exceptions) and inspect/print relevant data in the except suite do you see anything unusual? - is everything as expected? If you are using an IDE **now** is a good time to learn its debugging features Or the built-in [Python debugger](https://docs.python.org/3/library/pdb.html). ... [What is a debugger and how can it help me diagnose problems?](https://stackoverflow.com/questions/25385173/what-is-a-debugger-and-how-can-it-help-me-diagnose-problems) – wwii Dec 08 '20 at 19:35
  • I can see that the other letters that print are not in order. For example, `('L', {}) ('u', {}) ('k', {}) ('e', {}) ('M',{}), ('h',{})`. I think the problem is that is catching only the first letter of the words (splitting it?) – V_sqrt Dec 08 '20 at 19:37
  • Your for loop over Neighbors loops over the characters in the "Neighbor" item and adds it as an end-node in the edge, instead of the actual Neighbor. And since there isn't a node with that name, networkx adds that as a node in your graph. You should remove that for loop and replace it with `G.add_edge(n,neighbors)` – cookesd Dec 08 '20 at 20:20

2 Answers2

1

Each item in df.Neighbors is a string. When you iterate over it with for neigh in neighbors: You add each character of the neighbor to the node. For example the first node looks like

>>> G.nodes
>>> NodeView(('Luke', 'A', 'l', 't', 'e'))

As long as each row only has a single Neighbor, replace the for loop with

    # for neigh in neighbors:
    #     #add edge weights here, attribute of G.add_edge
    #     G.add_edge(n,neigh)  
    G.add_edge(n,neighbors)

Although this doesn't alleviate the KeyError.

While 'John', 'Laura', and 'Stella' are neighbors they are also nodes in the graph but they were created with .add_edge and never had a color assigned to them.

>>> for thing in G.nodes.items():
...     print(thing)
('Luke', {'weight': 3, 'colour': 'orange'})
('Alte', {'weight': 24, 'colour': 'blue'})
('John', {})
('Michael', {'weight': 43, 'colour': 'red'})
('Laura', {})
('Ludo', {'weight': 21, 'colour': 'orange'})
('Stella', {})

You can add those nodes first with default attributes before iterating:

...
    G.add_nodes_from(df.Neighbors,colour='white',weight=0)
    for row in df.itertuples(): # row is the row of the dataframe
        ...

If your node attributes can begin with capitals the graph construction could be written:

def get_graph_from_pandas(df):
    
    G = nx.DiGraph() # assuming the graph is directed since e.g node 1 has 
                     # 3 as neighbour but 3 doesnt have 1 as neighbour
    
    
    G.add_nodes_from(df.Neighbors,Colour='white',Weight=0)
    G.add_edges_from(df[['Node','Neighbors']].itertuples(index=False))
    dg = df.set_index('Node')
    G.add_nodes_from(dg[['Colour','Weight']].T.to_dict().items())
        
    return G

>>> for thing in G.nodes(data=True):
...     print(thing)
('Alte', {'Colour': 'blue', 'Weight': 24})
('John', {'Colour': 'white', 'Weight': 0})
('Laura', {'Colour': 'white', 'Weight': 0})
('Stella', {'Colour': 'white', 'Weight': 0})
('Ludo', {'Colour': 'orange', 'Weight': 21})
('Luke', {'Colour': 'orange', 'Weight': 3})
('Michael', {'Colour': 'red', 'Weight': 43})
>>> for thing in G.edges(data=True):
...     print(thing)
('Alte', 'Ludo', {})
('Alte', 'Luke', {})
('Ludo', 'Stella', {})
('Luke', 'Alte', {})
('Luke', 'John', {})
('Michael', 'Laura', {})

You can get the node colors directly from G.nodes.items

pos = nx.draw(G, with_labels=True, 
              node_color=[d['Colour'] for n,d in G.nodes.items()], 
              node_size=200)

or nx.get_node_attributes

pos = nx.draw(G, with_labels=True, 
              node_color=nx.get_node_attributes(G,'Colour').values(),
              node_size=200)
wwii
  • 23,232
  • 7
  • 37
  • 77
  • thanks wwiii. Unfortunately I am still getting the same error. The G.nodes prints correctly the nodes. However, I am getting the same error for colour/Colour. May I ask you if you get similar error? (if you can check it with the data I provided). – V_sqrt Dec 08 '20 at 20:19
  • 1
    `G.nodes(data=True)` is different from `G.nodes`. – willcrack Dec 08 '20 at 20:55
  • thnx @willcrack. – wwii Dec 08 '20 at 21:01
1

wwii answer solves one problem.

However there are a number of problems that need to be fixed:

  1. Only nodes in column Node will have color, users that are only introduced in Neighbors column will be created in G.add_edge(n,neighbor), and won't have a color assigned. You need to decide which color to set for these nodes.

  2. The weight you want to attribute to the edges is being attributed to the nodes.

df = pd.DataFrame(  data = {"Node": ["Luke", "Luke", "Michael", "Ludo", "Alte", "Alte"],
                            "Neighbors": ["Ludo", "John", "Laura", "Stella", "Ludo", "Luke"],
                            "Colour": ["orange", "orange", "red", "orange", "blue", "blue"], 
                            "Weight": [3, 3 ,43, 21, 24, 24] 
                        }
              )
   

NROWS = None
def get_graph_from_pandas(df, v = False):
    
    G = nx.DiGraph() # assuming the graph is directed since e.g node 1 has 
                     # 3 as neighbour but 3 doesnt have 1 as neighbour
    
    for row in df.itertuples():
        print(row)
        n = row.Node
        w = row.Weight
        c = row.Colour
        neighbor = row.Neighbors
        
        G.add_node(n, weight = w, colour = c) # only nodes in column Node will have color
                                              # users that are only introduced in Neighbors column dwont have column
        if neighbor not in G.nodes:
            G.add_node(neighbor, weight = w, colour = "yellow") # this will set the default color to yellow
        G.add_edge(n,neighbor, weight = w) # weight of edge
            
    return G
        
G = get_graph_from_pandas(df, v = False)

print("Done.")
print("Total number of nodes: ", graph.number_of_nodes())
print("Total number of edges: ", graph.number_of_edges())

fig = plt.figure(figsize=(2,2))

pos = nx.draw(G, with_labels=True, 
              node_color=[node[1]['colour'] for node in G.nodes(data=True)], 
              node_size=200)

for node in G.nodes(data=True):
    try:
        node[1]['colour']
    except KeyError:
        print(node)

willcrack
  • 1,794
  • 11
  • 20
  • thanks willcrack. So the colour in the Colour column is the colour of nodes in Node column. All of them should have a colour. If they do not have because, for example, they are not in the Nodes column, then I would assign a default colour (e.g. yellow), just to avoid this potential issue – V_sqrt Dec 08 '20 at 20:23
  • Thanks a lot willcrack. Your code has fixed the issue. – V_sqrt Dec 08 '20 at 20:26
  • This was happening because your Neighbors column used to be a column of lists – willcrack Dec 08 '20 at 20:29
  • 1
    Also, I added an edit. You should only change a node to yellow if the node is not already in the graph – willcrack Dec 08 '20 at 20:30