4

Just curious what the recommended best practice is for the 'million follower' twitter-style fanout problem on AppEngine.

Is it still Brett Slatkin's proposed solution (see: http://www.scribd.com/doc/16952419/Building-scalable-complex-apps-on-App-Engine) ?

Or could the new search API be used here? Could you do a search with a large number of author filters e.g. 'author:bob OR author:alice OR author:mike ... ' ad nauseum? Or are there limits on Search API query complexity? Or would it be horrible performance wise? Might it be a reasonable solution if there is a limit on the number of people one can follow?

Thanks for any feedback!

RLH
  • 15,230
  • 22
  • 98
  • 182
peterk
  • 583
  • 1
  • 4
  • 8

1 Answers1

3

In general, 'or' queries aren't efficient in any database, and that includes the search API - they all require doing multiple independent queries, and gluing the results together.

The fanout problem can be handled much better by the prospective search API.

Nick Johnson
  • 100,655
  • 16
  • 128
  • 198
  • Hi Nick, thanks for the reply. Would the query in propsective search not also contain lots of 'ORs' e.g. 'author:mike OR author:jane' etc. Let's say I needed to check against a couple of hundred authors a user is following...is it going to be efficient to match each doc against many different queries, one for each user's following list? There could be thousands of users each with potentially hundreds of users in their following list. Or am I thinking about it the wrong way? Thanks again for any help... – peterk May 17 '12 at 10:01
  • @peterk Prospective search inverts the operation, and sends notifications to every account that's following someone when they post a new update. – Nick Johnson May 17 '12 at 12:04
  • Reading up I think prospective search can solve the 'who's subscribed to this comment' problem but the issue of efficiently retrieving a feed thereafter is an open one (vs firing matches off on a channel and forgetting them, for example). Maybe just store batches of matches in datastore entities and query for entity where match_list = user? Is there a limit on how long subscribed queries can be? If one user's subscription is following hundreds or thousands of people it could be a long list of 'ORs'... Thank you again :) – peterk May 22 '12 at 11:04
  • The idea is good with prospective search. However at the time of writing you can only register a max of 10k subscriptions using the GAE prospective search API. So I am still following the technique Brett mention in his talk. – dynamokaj Oct 18 '14 at 19:15