2

I have a huge graphdatabase with authors, which are connected to papers and papers a connected to nodes which contains meta information of the paper. I tried to select authors which match a specific pattern and therefore I executed the following cypher statement in java.

String query = "MATCH (n:AUTHOR) WHERE n.name =~ '(?i).*jim.*' RETURN n";
db.execute(query);

I get a resultSet with all "authors" back. But the execution is very slow. Is it, because Neo4j writes the result into the memory?

If I try to find nodes with the Java API, it is much faster. Of course, I am only able to search for the exact name like the following code example, but it is about 4 seconds faster as the query above. I tested it on a small database with about 50 nodes, whereby only 6 of the nodes are authors. The six author are also in the index.

db.findNodes(NodeLabel.AUTHOR, NodeProperties.NAME, "jim knopf" );

Is there a chance to speed up the cypher? Or a possiblity to get all nodes via Java API and the findNodes() method, which match a given pattern?

Just for information, I created the index for the name of the author in java with graph.schema().indexFor(NodeLabel.AUTHOR).on("name").create();

Perhaps somebody could help. Thanks in advance.

EDIT:

I run some tests today. If I execute the query PROFILE MATCH (n:AUTHOR) WHERE n.name = 'jim seroka' RETURN n; in the browser interface, I have only the operator NodeByLabelScan. It seems to me, that Neo4j does not automatic use the index (Index for name is online). If I use a the specific index, and execute the query PROFILE MATCH (n:AUTHOR) USING INDEX n:AUTHOR(name) WHERE n.name = 'jim seroka' RETURN n; the index will be used. Normally Neo4j should use automatically the correct index. Is there any configuration to set?

I also did some testing in the embedded mode again, to check the performance of the query in the embedded mode. I tried to select the author "jim seroka" with db.findNode(NodeLabel.AUTHOR, "name", "jim seroka");. It works, and it seems to me that the index is used, because of a execution time of ~0,05 seconds.

But if I run the same query, as I executed in the interface and mentioned before, using a specific index, it takes ~4,9 seconds. Why? I'm a little bit helpless. The database is local and there are only 6 authors. Is the connector slow or is the creation of connection wrong? OK, findNode() does return just a node and execute a whole Result, but four seconds difference?

The following source code should show how the database will be created and the query is executed.

public static GraphDatabaseService getNeo4jDB() {
    ....
    return new GraphDatabaseFactory().newEmbeddedDatabase(STORE_DIR);
}

private Result findAuthorNode(String searchValue) {
    db = getNeo4jDB();

    String query = "MATCH (n:AUTHOR) USING INDEX n:AUTHOR(name) WHERE n.name = 'jim seroka' RETURN n";

    return db.execute(query);
}
h.bisch
  • 51
  • 6

1 Answers1

2

Your query uses a regular expression and therefore is not able to use an index:

MATCH (n:AUTHOR) WHERE n.name =~ '(?i).*jim.*' RETURN n

Neo4j 2.3 introduced index supported STARTS WITH string operator so this query would be very performant:

MATCH (n:Author) WHERE n.name STARTS WITH 'jim' RETURN n

Not quite the same as the regular expression, but will have better performance.

William Lyon
  • 8,371
  • 1
  • 17
  • 22
  • So normally `CONTAINS` should also be able to use an index? But if I use `CONTAINS` as described in the manual, I get always the error: Invalid input 'O': expected 'r/R' (line 1, column 32 (offset: 31)) "MATCH (n:AUTHOR) WHERE n.name CONTAINS 'jim' RETURN n;" – h.bisch Dec 21 '15 at 19:29
  • Currently I believe only `STARTS WITH` uses an index (and not `CONTAINS` or `ENDS WITH`). You can verify this by prepending "PROFILE" or "EXPLAIN" to your query and running in the Neo4j Browser to see the execution plan. – William Lyon Dec 21 '15 at 19:32
  • What version of Neo4j are you using? `CONTAINS`, `STARTS WITH` and `ENDS WITH` were added in 2.3 – William Lyon Dec 21 '15 at 19:34
  • OK, it is working similair to SQL and OrientDB. Actually I'm working on 2.2.3. I thought I'm working on 2.3. I will update Neo4j and try it again. Thanks – h.bisch Dec 21 '15 at 19:57
  • Where is it exactly shown if the index is used or not? I execute the query with explain or profile. If I check the option _plan_ or _code_ in the browser, I'm not able to see if the index is used or not, neither with `STARTS WITH` nor with `CONTAINS`. – h.bisch Dec 21 '15 at 20:39
  • I tried the query `MATCH (n:AUTHOR) WHERE n.name STARTS WITH "jim" RETURN n;` in the browser and in my java code. But I have the same performance as with the regex. In the browser 457ms, in my Java Code 4,3s – h.bisch Dec 21 '15 at 20:50
  • Did you create an index: `CREATE INDEX ON :AUTHOR(name)` before? – Michael Hunger Dec 21 '15 at 21:11
  • 2
    Using `PROFILE`, you should see a "NodeIndexSeekByRange" (as opposed to NodeByLabelScan and a Filter operation) to confirm `STARTS WITH` is using the index in the query execution plan. Verify the index the index exists with `:schema` – William Lyon Dec 21 '15 at 21:11
  • Note that the labels and property-names are case-sensitive! – Michael Hunger Dec 21 '15 at 21:11
  • I created, as mentioned in my question, the index in java. I also checked if the index exists with `:schema`. The index on name exists and is online ` ON :AUTHOR(name) ONLINE `. But if I use profile, I can't see "NodeIndexSeekByRange". I have only "NodeyByLabelScan", "Filter", "ProduceResults" and "Result" on my execution plan. I know that label and names are case-sensitive. But they should match – h.bisch Dec 21 '15 at 22:02
  • @William Lyon, @Michael Hunger. I run some tests today and added the information to my initial question. It seems that I have to use `USING INDEX` in my query, that the index is used. But if I try to execute the query in Java it is also very, very slow in comparision to the `findNode()` method of the Java API – h.bisch Dec 22 '15 at 11:52