I am currently working with a Neo4j database, version 3.0.3, inside a Java application, utilizing neo4j jdbc driver version 3.0.1 (yeah I know they don't match but I figured that is ok for now), and running queries that are pretty specific. I am making sure to use labels on my nodes and types on my relationships when writing my cypher queries with the JDBC library.
My data set is a network of Person nodes that KNOWS other Person nodes. The KNOWS relationship has a date on it to allow for keeping track of when that connection was made. I wanted to do some data mining on pathways between two distinct Person nodes, as illustrated below. As people start to know more and more people, I would like to see if there are unknown relationships to my end nodes. This requires that I examine the Person nodes on pathways between the start and end Person and potentially the dates those relationships were created.
I ran a pretty specific query, at least I thought so, today both in the neo4j browser feature and in my Java code,
`MATCH path = (p:Person {name: "garret"})-[:KNOWS*1..6]->(p1:Person {name: "adam"}) return path`
The above query returned a total of 30 paths between (garret) and (adam). The PROFILE of that query in the Neo4j browser shows that it returns 38 ms. So certainly looks lightning fast.
Wiring that query into my Java code using the StatementResult object and executing the query, I found that when I called the list() method, as shown below, that the run time for that is 42.7 seconds!
List<Record> records;
StatementResult r = session.run("MATCH path = (p:Person {name: 'garret'})-[:KNOWS*1..6]->(p1:Person {name: 'adam'}) return path");
records = r.list();
My questions are this:
- Why does the list() command take that long?
- What is the best way to deal with Neo4j result sets?
- Should I be looking at other things from the PROFILE run that would aid me in determining whether that .list() call will end up taking a long time?
I am in the early stages of this project but as my data set grows, that 42.7 seconds to fetch the results now will certainly grow pretty dramatically. I would like to get some advice from the community on what is the best way to minimize this delay in retrieving the data from the StatementResult.
I appreciate all the advice you folks can provide.