1

I've spent the past couple of days spending my time trying to figure out how to do proper paging in Google App Engine. Once I have a workable solution, I'll open the stuff out on google code as I see lots of people struggling with this.

The goal: create a simple but powerful paging solution for GAE, with the following features:

  • forward and backward paging, support for first page and last page
  • sorting by (a subset) of fields (one order only at a time)
  • having a filtering criteria in a like fashion (e.g. typing 'Tom' will filter and only display persons with name starting 'Tom')
  • keep track of total number of pages

Design decissions:

  • do NOT use limit/offset as it doesn't scale. Instead rely on cursors
  • use sharding to keep the total number of pages updated
  • accept compromises, but aim for the best performance/scalability

The obstacle (i.e. last bastion): I feel the only problem left is when I have a filtering criteria (e.g. name begins with 'Tom') and a sorting criteria on a different property.

e.g. Person [name, age]

  • Filter by name 'Tom*'
  • Sort by age

Reading through the documentation, I thought I've found the solution:

Query q = new Query("Person");
q.addFilter("name", FilterOperator.GREATER_THAN_OR_EQUAL, nameFilter);
q.addFilter("name", FilterOperator.LESS_THEN, nameFilter + "\uFFFD");
q.addSort("name", SortDirection.ASCENDING);
q.addSort("age", SortDirection.ASCENDING);

I thought this would return:

  • Tom2 18
  • Tom1 20

Unfortunately, this returns

  • Tom1 20
  • Tom2 18

as the query is first filtered by name, then by age as a secondary key.

The only solution I can think of is to put the whole filter result into a Java structure, sort by using a comparator and then pick the records I want to display. But this has an additional problem that my cursor logic disappears. Which then kinda means I have 2 logical paths for solving paging. Which might be the ultimate solution, but I wonder if anyone smarter has a better idea.

Any ideas welcome.

Thanks, Matyas

Matyas
  • 311
  • 1
  • 4
  • 14
  • You are describing two orthogonal issues: efficient paging and efficient querying. You should solve them separately. – Adam Crossland Jun 20 '12 at 20:48
  • I'm trying to find a solution to paging with a certain functionality on Google App Engine. I can see where you're coming from, however efficient paging relies on efficient querying withing the bounds on GAE. I can think of a very efficient way to page... it's just that I want to page using GAE. – Matyas Jun 20 '12 at 20:56

2 Answers2

2

I suspect you may want to look at full text search. http://googleappengine.blogspot.jp/2012/05/looking-for-search-find-it-on-google.html

If you create an index for name, you should be able to use the (undocumented) '~' operator to do stem matching. It was mentioned it'll be documented in the next release by someone from Google (I believe) here GAE Full Text Search API phrase matching

If you put the entity id into the text index, you'd have a list of entities containing that name stem and then couldn't you filter by a list of keys and sort by age.

Shawn

UPDATE: Now that the docs are released I see the "~" operator is NOT stem matching but rather merely plurals

To search for plural variants of an exact query, use the ~ operator: ~"car" # searches for "car" and "cars"

Community
  • 1
  • 1
user1258245
  • 3,639
  • 2
  • 18
  • 23
1

First recommendation would be to not use inequality filters on the name. If you must use inequality filters, then you're right, you must sort by name before sorting by age. Which, yes, really sucks, because if you want to page, you need the entire result to then sort and figure out how to page.

I'd approach this in trying to figure out how to avoid the inequality filter, either through your UI or by denormalizing your name field in the datastore.

I don't really know why you'll have Tom1 and Tom2, but if you know you'll have a bunch of Toms, you might have a name_base field that just stores "Tom", then you can fetch all the toms without an inequality.

dragonx
  • 14,963
  • 27
  • 44
  • You can't. When using the inequality filters, you have to sort on the property. See https://developers.google.com/appengine/docs/java/datastore/queries#Restrictions_on_Queries – Matyas Jun 20 '12 at 20:01
  • I edited my comment immediately after I posted it, I didn't read your query carefully enough. – dragonx Jun 20 '12 at 20:06
  • Ok, Tom might have been a stupid example. Consider this instead. Filtering on just 'T', you'd get: Theodor, Thomas, Tom. But you could filter on 'Matyas' which would give you: Matyas, Matyasello, etc – Matyas Jun 20 '12 at 20:11
  • I got a funny idea. Limit names to say 15 characters and store 15 fields for name, each field having one additional character in length. Can't imagine the index penalty though. What really makes this hard is that the filtering can be anything really. Fortunatelly, I've at least only have a 'name*' inequality, not a '*name*' one. – Matyas Jun 20 '12 at 20:14
  • What's the scenario from the end user's perspective where he'll need a name* filter? You might want to rethink that into a design that works well with GAE. For example if you have a page per letter, you can search by only the first letter (have a denormalized field with the first letter). Or if you want to give them some flexibility of searching by name, maybe have an autocomplete UI that suggests various names, but then do the actually sorted search only on a unique name, etc. – dragonx Jun 20 '12 at 20:44
  • The application enables to filter on the name of persons in a free text field. It is not used to have page by letter. Actually the concept is really simple. The user may want to see every person whose name begins with 'T', or 'To' or 'Tom' or 'Toma' or 'Tomas'. The auto-suggest option would certainly work, and it's a nice suggestion. I am at a point where I think that this design/requirement does not fit well with GAE. Still it would be interesting to see if there's a better solution then what I outlined in my question. – Matyas Jun 20 '12 at 21:00