I'm trying to create a Flask web app that can query a NetworkX graph, to return the shortest paths (using Dijkstra's algorithm, plus Yen's algorithm for the k-shortest paths).
Here is my sample code (stripped of error handling et c., and only showing the Dijkstra implementation), which returns the path if you submit a GET request to localhost:3000/d?source=a&target=b
:
import csv
import json
import networkx as nx
from flask import Flask, request
app = Flask(__name__)
G = nx.Graph()
@app.route('/d')
def dijkstra():
global G
source = request.args['source'].lower().strip()
target = request.args['target'].lower().strip()
path = nx.dijkstra_path(G, source, target, weight='weight')
return json.dumps(path)
if __name__ == '__main__':
global G
with open('nodes.csv', 'rb') as f:
reader = csv.reader(f)
node_list = list(reader)
for i in nodes:
G.add_node(i[0])
with open('edges.csv', 'rb') as f:
reader = csv.reader(f)
edge_list = list(reader)
for i in edges:
G.add_edge(i[0], i[1], weight=i[2])
app.run(host='0.0.0.0', port=3000, use_reloader=False)
The graph is very large (one million nodes, twenty million edges), so I am currently populating from CSVs it as the app loads, rather than building it for every request. This takes about five minutes when run locally on my MacBook Pro (3GHz, 16GB RAM). Lookups take different amounts of time, but generally around fifteen seconds, and performance generally degrades after a certain amount of use, meaning I have to restart the app.
Question one: as this is slow and pretty memory intensive, is there a way to store this graph so that I don't have to generate it every time, then hold it in memory?
Question two: am I approaching it the right way, by using G
as a global variable?
I haven't worked with graphs this large before, so am aware my implementation is probably far from ideal, and would be grateful for any thoughts about how to make this perform better/more reliably!