0

According to streaming example at http://orientdb.com/docs/3.0.x/java/Java-Query-API.html, we can use the Orient result set streaming API as follows

ODatabaseDocument db;
...
String statement = "SELECT FROM V WHERE name = ? and surnanme = ?";
OResultSet rs = db.query(statement, "John", "Smith");
rs.stream().forEach(x -> System.out.println(x.getProperty("age")));
rs.close();

This is fine but too trivial - what if we need to keep the rs/stream around? We can't very well close the resultset because we want to reuse the stream on a subsequent user request in a web application, say (in scenarios such as paging).

But to keep the streams "alive" the Orient user guide says that:

OResultSet is implemented as a paginated structure, that holds some iterators open during the iteration. This is true both in remote and in embedded usage.

You should always invoke OResultSet.close() at the end of the execution, to free resources.

OResultSet instances are automatically closed when you close the ODatabase that returned them.

It is important to always close result sets, even when they are converted to streams (after the stream is consumed).

Are there any best practices around this. As far as I can tell, we would need to:

1) Keep the Orient database connection open until the user "paging" session is done (which could be say 5-10 minutes). Only when the user says "done" can we close the result set & close the database connection. The Orient database connection (and whatever stream it generated) thus becomes "private" to a single application user. Moreover, since every user request can be activated on a different thread, the said database connection would need to be made active on the current thread before using it.

2) Use the Java Stream API to navigate through arbitrary subsets of the "arbitrarily" large resultset. How would memory usage be handled by the underlying Orient db stream implementation? What determines the memory usage for using a "single rs/stream" and keeping it around for a while? What happens when we have thousands of open rs/streams especially if each user has their own "private" rs/stream they're looking at?

3) If a given Orient database connection can only be used on a single thread at a time (an Orient requirement), how do we handle multiple users with their own custom long-lived rs/streams/connections? Does this mean that if we have a 1000 clients using their own private rs/stream (that they hang on to for say 5 minutes), then we have to keep 1000 database connections open (i.e. one for each user/rs?) What are the limits around this? This style is obviously quite different from the more typical execute query/close rs pattern for quick user interaction that is stateless from one request to the next (naive paging that re-executes queries every time for a given range and this can get expensive)

P.S. I realize that once we get a Java stream, then we pretty much start just using the Java API itself - so I suppose that JOOQ streaming usage (for example) would be pretty similar to Orient streaming usage once you start getting into the Stream interfaces - I'm not familiar with the Java Streams API, but I suppose How to paginate a list of objects in Java 8? is a good place to start?

1 Answers1

0

My conclusion is that streaming works well when scrolling through a large result set without consuming a large amount of memory or having to keep re-executing offset/limit queries (similar to forward only scrolling over JDBC resultsets). A typical use case is an export scenario.

For forward and backward paging, in Orient at least, you likely need an indexed property/properties and perform range queries - you'll need to make sure the index is SB-tree so that it supports range queries.

FYI, Solr has a cursor mechanism which works pretty well for forward pagination on sorted results - but if you keep some simple state markers on the client you can also go back to results already encountered. "go to" random pages is not supported in Solr cursors but you can always re-sort/filter on some other criteria in order to move "useful" results to the top of the resultset instead of deep paging (https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html)