1

I have as input an SQLite database of a directed acyclic graph (not necessarily connected) where the nodes are stored as

node_id | name
1       | a
2       | b
3       | c
4       | d
5       | e
6       | f

and the edges are stored as:

edge_id | source | target
1       | 1      | 2
2       | 1      | 3
3       | 1      | 4
4       | 2      | 5

This is a toy example, my database has about 10^4 to 10^5 nodes (small but not trivially so). I need to query all of the "children" nodes from a given node. As an example:

input | expected output
1     | [2,3,4,5]
2     | [5]
3     | []
6     | []

I am using python's SQLite3 module to do this now, but I have to recursively make a select call for each node -- this can get prohibitively expensive. Is there a smart way to do this, perhaps with a single SQLite call?

Hooked
  • 84,485
  • 43
  • 192
  • 261
  • Could the downvoter please explain what is missing in this question? I provided a minimal example with expected output and described what I've done so far. I'm always looking to improve the question for others. – Hooked Apr 08 '15 at 18:43
  • 1
    To be honest I'd probably just query the whole set in one go and build an adjacency list. – Daniel Roseman Apr 08 '15 at 18:44
  • Is your Python new enough to have SQLite 3.8.3? – CL. Apr 08 '15 at 18:45
  • @CL. I'm using Python 2.7.8, stock from Ubuntu 14.10 which gives my SQLite version as 2.6.0. Is there a specific solution that would work for the version you mention? (I would love to hear it even it doesn't immediately help!) – Hooked Apr 08 '15 at 18:47
  • @DanielRoseman You mean pull the entire dataset into memory (in python) and then parse it from there? – Hooked Apr 08 '15 at 18:48
  • Yeah. Assuming the graph is reasonably sparse, that shouldn't be too much data for a modern computer. Though it depends on what you're doing and how often, of course. – Daniel Roseman Apr 08 '15 at 18:50
  • @DanielRoseman The problem is the DAG is sparse (with the size I mentioned), but the database itself is much, much larger with 10^8 to 10^9 nodes (there are many independent DAGs contained in the full database). – Hooked Apr 08 '15 at 18:51
  • 3.8.3 would have [recursive common table expressions](http://www.sqlite.org/lang_with.html#recursivecte), doing the recursion from your code is not necessarily any more expensive. – CL. Apr 08 '15 at 18:56
  • 1
    When specifying "All children nodes from a given node", do you mean just one level or the whole tree? Please also see: http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database and http://karwin.blogspot.co.uk/2010/03/rendering-trees-with-closure-tables.html . Also, you might want to consider something like http://neo4j.com/ (http://neo4j-rest-client.readthedocs.org/en/latest/) which will give you a query language to express exactly the type of queries you are talking about. – A_A Apr 10 '15 at 08:36
  • @A_A Those are some greats references and links, thank you very much! I do need the whole tree, all levels of recursion from the root to the final leaves. – Hooked Apr 10 '15 at 13:56
  • Glad to be of help, as an intermediate solution, you might also want to have a look at something like Networkx and its traversal algorithms (http://networkx.github.io/). You can export the graph as an adjacency list, read it from within Networkx and then work with your graph in any way you like. – A_A Apr 10 '15 at 14:05

0 Answers0