0

After a helpful answer to my original question, I decided to use internal data (which I can't share here). The internal data follows the same format as the mock data. I simply copied the data over to the same working directory, making sure that the new data has the same format, i.e. same column headers, etc. I used DiffChecker to make sure that app.py (from my original post) matches the Proof of Concept (appPOC.py). The internal data has more than 600 nodes, and more than 3000 edges.

The code to make the interactive dashboard is the same as the one I used for my original post. However, this time I run into this KeyError:

Traceback (most recent call last):
  File "appPOC.py", line 75, in <module>
    hovertext = "Document Description: " + str(G.nodes[node]['Description']) + "<br>" + "Document Name: " + str(G.nodes[node]['DocName']) + "<br>" + "Document ID: " + str(G.nodes[node]['DocumentID'])
KeyError: 'Description'

The data itself should be fine, as I can plot a network without the hovering text next to the node.

To summarize: app.py can plot the mock data, appPOC.py (which is identical, but has a different file name) can't plot the internal data. This leads me to believe that there is something wrong with the internal data in the CSV file.

Edit: I figured out that if the target is not listed in the elements, the graph fails to be drawn. Is there anyway to create a node automatically (like in Gephi) if the (target) node is not defined in the elements?

rothstem
  • 25
  • 4

1 Answers1

1

NetworkX creates nodes for from- and to-nodes of each edge. Hence, with

G = nx.from_pandas_edgelist(edges, 'Source', 'Target')

you're graph has all possible nodes. However, with

nx.set_node_attributes(G, nodes.set_index('Doc')['Description'].to_dict(), 'Description')
nx.set_node_attributes(G, nodes.set_index('Doc')['DocumentID'].to_dict(), 'DocumentID')

you only fill the node attributes 'Description' and 'DocumentID' for those in your nodes data frame. A simple workaround is to replace the

str(G.nodes[node]['Description'])

with

str(G.nodes[node].get('Description', ''))

and similarly for 'DocName' and 'DocumentID'. More information on the get method you find at: Why dict.get(key) instead of dict[key]? Basically, we use that networkx uses dict to store values and make use of the get method, which allows to supply a default value.

A simple reproducible and minimal example

import networkx as nx
g = nx.karate_club_graph()
# all nodes in this graph have the node attribute 'club' filled
# we add a node without this node attribute
g.add_node("Test")
print(g.nodes[0]["club"])
# 'Mr. Hi'
# print(g.nodes["Test"]["club"]
# results in KeyError: 'club'
print(g.nodes["Test"].get("club", ""))
# ''
Sparky05
  • 4,692
  • 1
  • 10
  • 27