0

I have one last query remaining in my application to complete transferring it from Parse and it seems to be the one that's going to cause the most trouble.

Here's the Parse query:

let potential_query = UserAccounts.query()!
let excluded_objects = [String]()

for(...) excluded_objects.push(...);

potential_query.whereKey('objectId', notContainedIn: excluded_objects);
potential_query.whereKey('question_count', greaterThan: 2);
potential_query.whereKey('deactivated', equalTo: false);
potential_query.whereKey('discovery_enabled', equalTo: true);
potential_query.whereKey('gender', equalTo: 'MALE');
potential_query.whereKey('age', greaterThan: 18);
potential_query.whereKey('age', lessThan: 23);
potential_query.whereKey('location', nearGeoPoint: ..., withinMiles: 15); // User within 15 miles based on location data.
potential_query.limit = 1;

Please note that the values provided are not static and are changed based on the authenticated user. Going through the documentation for Firebase it seems like there's not really any options for advanced querying. objectId would be the uid in Firebase.

Is this even possible?

Hobbyist
  • 15,888
  • 9
  • 46
  • 98
  • That is indeed going to be challenging. See http://stackoverflow.com/questions/26700924/query-based-on-multiple-where-clauses-in-firebase for some options. – Frank van Puffelen Jun 27 '16 at 14:38
  • @FrankvanPuffelen Thanks, I've already looked at that link and I've gone over the solutions. 1. isn't viable because I'd be downloading private user data to the client for every user that's registered to my application. 2. seems slightly more viable, but the issue is that location and age are varying and cannot be stored in the database statically. QueryBase may have a way around this(Distance checking), I'll check it out. 3. looks like it would be a mess, do you have any good reads on indexing in firebase? The docs aren't quite cutting it for me. – Hobbyist Jun 27 '16 at 17:43
  • 1
    Indexing is not Firebase-specific. I highly recommend this article on [NoSQL data modeling](https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/). – Frank van Puffelen Jun 27 '16 at 17:45
  • Thank you, I'll read it over. – Hobbyist Jun 27 '16 at 17:47

1 Answers1

0

[Background: I'm currently a backend engineer on the Firebase team. I used to be a Parse backend engineer and spent a lot of time on app performance]

The Parse API has a more expressive query language, which is a double edged sword. You're blissfully free to write almost any query and the Parse backend will try its best to guess which indexes will help the best. I actually co-presented F8 last year with a relevant talk "Running at Scale on Parse". You can see some of the bits regarding query performance starting at 16:30. TL;DR: only a few of those query operators are going to help narrow down the database scan at all. The most valuable two in your case are probably the geo query and the age query. Standard database indexes simply cannot index more than one inequality (less than/greater than) or an inequality and sort on different keys. We can talk about how to index on multiple equality fields + one inequality, but equality conditions work best when:

  • You're (almost) always going to be using these keys in your query. If not, you need separate indexes for when the key is part of your query and when it's not.
  • Your equality term greatly narrows down results. E.g. Ignoring non-uniform demographics, age equality should roughly cut your results down 80x whereas gender equality should only cut results down 2x.

For your particular case, I would use GeoFire to build a geographical index and use the standard range operators on the age query:

myRef.orderByChild('age').startAt(18).endAt(23).limit(someGuess)

You need 'someGuess' because you're going to want to run the rest of the query operators client-side. You can (but don't necessarily have to) use limit if there's going to be a lot of false positives with your query.

If you don't know whether it's quicker to use the age or geolocation scan, you can try both with Promise.race

Thomas Bouldin
  • 3,707
  • 19
  • 21
  • Thanks for the information, and `Promise.race` is a neat little tool I've never heard about. However with the massive amounts of query data our application has, (Mobile dating) and the fact that we have hundreds of thousands of accounts, (Nearing 300,000) this just seems fairly hard to decide on. Active users will already be matched with other nearby users, and while searching based on the age is fine, we also want to search at minimum by distance too. For example, if there's 200,000 users between 18 and 23 that aren't within 15 miles of you, what's the point in downloading them – Hobbyist Jun 28 '16 at 05:55
  • The same goes with downloading all the users around you, then filtering on the client, if the application is active in your area you would be downloading massive amount of information. Is it possible to query by location and then say.. `Download 50, scan, if not exists, download next 50, etc` Almost like pagination based on if a matching value was found, does this sound realistic? – Hobbyist Jun 28 '16 at 05:56
  • I've been playing with https://github.com/davideast/Querybase and it's query functionality is really, really nice. But it seems like data overload. Also thanks for the `Promise.race` haven't seen that before, nice tool. – Hobbyist Jun 28 '16 at 05:59
  • 1
    You're absolutely right. This is what databases actually do, but typically it's local to the machine. This is a really bad idea on a RESTful system, but isn't *always* bad on Firebase because it has delta sync. It's totally possible that your needs don't map well to the Firebase database. I find many things are possible but might need a deeper understanding of your data. Age is an interesting thing because it's whole numbers. You technically can build a compound index if you do equality matching and union the results of 18yr olds in range + 19yr olds in range +.... – Thomas Bouldin Jun 28 '16 at 06:03
  • 1
    FWIW, dating is a really hard space from the data perspective because the queries are so dynamic. I spent a month at Parse helping one dating app do query analysis and optimizations. I actually recommend the F8 talk to you even more; it was largely inspired by that dating app case study. E.g. I suddenly know what your "excludedItems" query is and know that the naive solution won't scale (my case study suggested their app was visited by bots with tens of thousands of excluded items. Those bots' performance tanked the whole app). – Thomas Bouldin Jun 28 '16 at 06:07
  • 1
    300K users is no small feat; you're clearly serious about this. The Firebase DB offers sync, but complex queries can require data that would have been in an index in other DBs. I'd make sure you brush up on common indexing strategies for tree-based indexes and talk to a data analyst to understand the entropy in your dataset. If the Firebase DB just isn't the right thing for you, check out Google Cloud Datastore. Datastore is a document database like Parse and is massively scalable. Like Parse Server and Firebase, you'll need to understand your indexes, so talk to that data analyst. – Thomas Bouldin Jun 28 '16 at 06:13
  • Thanks for all the information, I'll watch the F8 talk tonight. – Hobbyist Jun 28 '16 at 19:14
  • What I've decided to do is run Heroku beside Firebase. Heroku will listen to firebase requests and keep an updated Mongo database reference with all of the query data required for our advanced queries. Nice and simple, a little bit of data duplication, but none-the-less required. Thanks for all of your comments. – Hobbyist Jun 29 '16 at 02:46