2

Is it correct to say that a com.google.appengine.api.datastore.Cursor simply stores an index position into a GAE Datastore index?

Are cursors durable? That is, can I store a cursor permanently and reuse it again and again knowing for sure that if it was pointing to 5000th position in the index, that's where it'll point forever?

What if the index shrinks to less than 5000 entries? Will using this cursor cause an error or simply return nothing?

For larger indexes (say 100,000 or more entries), can I first acquire cursors for every multiple-of-5000th position (say), store them and then use this set of cursors to do some work in a Map/Reduce manner?

I am actually using Objectify and not the DS directly, but AFAIK this will not affect the properties of Cursors vis-a-vis Indexes.

Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
markvgti
  • 4,321
  • 7
  • 40
  • 62

1 Answers1

3

Cursors only make sense in the context of the original query that was made. They are not exactly index positions/offsets. From Cursors and data updates:

The cursor's position is defined as the location in the result list after the last result returned. A cursor is not a relative position in the list (it's not an offset); it's a marker to which Cloud Datastore can jump when starting an index scan for results. If the results for a query change between uses of a cursor, the query notices only changes that occur in results after the cursor. If a new result appears before the cursor's position for the query, it will not be returned when the results after the cursor are fetched. Similarly, if an entity is no longer a result for a query but had appeared before the cursor, the results that appear after the cursor do not change. If the last result returned is removed from the result set, the cursor still knows how to locate the next result.

Also from Limitations of cursors:

Cursors are subject to the following limitations:

  • A cursor can be used only by the same application that performed the original query, and only to continue the same query. To use the cursor in a subsequent retrieval operation, you must reconstitute the original query exactly, including the same entity kind, ancestor filter, property filters, and sort orders. It is not possible to retrieve results using a cursor without setting up the same query from which it was originally generated.
  • Because the NOT_EQUAL and IN operators are implemented with multiple queries, queries that use them do not support cursors, nor do composite queries constructed with the CompositeFilterOperator.or method.
  • Cursors don't always work as expected with a query that uses an inequality filter or a sort order on a property with multiple values. The de-duplication logic for such multiple-valued properties does not persist between retrievals, possibly causing the same result to be returned more than once.
  • New App Engine releases might change internal implementation details, invalidating cursors that depend on them. If an application attempts to use a cursor that is no longer valid, Cloud Datastore raises an IllegalArgumentException (low-level API), JDOFatalUserException (JDO), or PersistenceException (JPA).

If your data doesn't change you're probably OK using cursors in a map/reduce manner (by restoring the original query), including pre-acquiring them.

Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97
  • How about storing Cursors to page through a result set? E.g., I want to show all the latest uploads (reverse chronological order) in paged manner. Using Cursors simplistically, I can always only load Page 1, then go Next, Next, Next etc. I can't jump from P1 to P10 w/o using offsets. If I could store cursors (let's say page size is fixed at 20 results/page), then I could retrieve the appropriate cursor for that page, then get the results from there... would this work in an expected manner? – markvgti Feb 12 '17 at 06:08
  • If there are no new uploads (i.e. data updates) it should. Example here: https://cloud.google.com/appengine/docs/python/datastore/query-cursors#cursors_and_data_updates – Dan Cornilescu Feb 12 '17 at 06:19
  • Huh? Seems a little-counter intuitive: if the Cursor points to 100th entry in index, shouldn't addition of a new upload simply mean that the previously-99th-result would be returned as the first result? So would it be correct to say a Cursor isn't an offset into an index, it is a pointer/reference to a specific entry in the index??? – markvgti Feb 12 '17 at 06:24
  • If there are data updates things may or may not work as expected, see 1st quote. You'd see the lastest upload if the cursor is at the 1st page, but not if it is at some subsequent page. – Dan Cornilescu Feb 12 '17 at 06:31
  • "You'd see the lastest upload if the cursor is at the 1st page, but not if it is at some subsequent page." Understood. But what if I am on 6th page, 20 results/page, a new upload happens (counting from start, 100th result becomes 101st result), 6th page is refreshed, will I see the 100th upload as the first result on this page or the 101st upload? Put another way: does the last result of page 5 (pre new upload) become first result of page 6 (post new upload) upon refresh (using a saved Cursor)? – markvgti Feb 12 '17 at 06:35
  • No, the last result of page 5 doesn't become the 1st on page 6. Upon refresh page 6 will keep starting with the *same* item (which is the one that was `next` at the time when that cursor was saved). This is why new elements preceeding it in the updated list won't be seen. – Dan Cornilescu Feb 12 '17 at 15:00
  • Thanks for the explanations and your patience. So this means cursors are neither very suitable for paging (unless one only offers Previous & Next controls) nor for Map/Reduce. Great! – markvgti Feb 12 '17 at 15:31