0

I have a small android app which uses firestore. I have a collection of documents (~5,000) where each document has a "client name". One of the requirements of the applications is to search documents by client name by contains query. (ie where search term appears anywhere in name).

I opted to ignore the advice of using a full text search solution because of cost, and decided to roll my own by storing permutations of the name in an array field on each document, and running a contains query on the array.

For example the name John Smith, an array is generated: [J, JO, JOH, JOHN, JOHN S, JOHN SM, JOHN SMI, JOHN SMIT, JOHN SMITH, S, SM, SMI, SMIT, SMITH]. This is enough permutations to be able to emulate a "starts with" query on any word in the name, which is good enough for my use case.

I create an index on this array field called searchTerms and run queries like this:

query.whereArrayContains("searchTerms", term)
    .orderBy("date", Query.Direction.Descending)
    .limit(20)

This ran fine initially, however my client has started to complain the searches being slow. I ran some tests on android, with query.get(Source.SERVER) and regularly get query times of up to 2 seconds. Searching by short terms (eg. on or al) produced the slowest results.

My question is, how inefficient is this exactly? doing some research led me to believe it's not as inefficient as it looks on the surface. It sounds like each element in the array is added to the index, so when I run an array contains query, am I just querying a huge index which is the size of my collection * the average array size? Is it index size slowing down the query?

And finally, is there any other native solution other than this?

rosghub
  • 8,924
  • 4
  • 24
  • 37
  • Firestore queries on the server scale by the size of the result set (so the number of documents you retrieve), and do not depend on the number of documents that exist in the collection. The only exception to this is when the data exists in the client-side cache, as that is not indexed in the same way. It's hard to be certain whether you are retrieving too much data, or the query is being run locally - although your use of `query.get(Source.SERVER)` seems to suggest it may be the former. – Frank van Puffelen Jul 04 '21 at 00:23
  • @FrankvanPuffelen I forgot to add these queries are limited to 20. Interestingly I did get better performance when I specified `Source.SERVER` so that makes sense. – rosghub Jul 04 '21 at 00:33
  • Yeah, if you get better performance that way it sounds like the client cache may have a large number of documents in it for this collection. The client isn't indexed in the same way, so the performance guarantees don't apply there. I'd recommend always using an `onSnapshot` listener, so that you get both the local data (if that is faster) and the data from the server. – Frank van Puffelen Jul 04 '21 at 00:58
  • @FrankvanPuffelen so does 1-2 seconds sound like a normal time for these queries given the collection size? I also ran some tests in a node script with the admin sdk and was getting around 800ms-1000ms which still sounds kind of slow – rosghub Jul 04 '21 at 01:06
  • There are too many unknown variables to say anything meaningful there. Things like document size, network latency, bandwidth, device speed, etc, all have an impact. Using `onSnapshot` listeners instead of `get` can help to hide such latency. – Frank van Puffelen Jul 04 '21 at 01:33

1 Answers1

1

Firestore queries on the server scale by the size of the result set (so the number of documents you retrieve), and do not depend on the number of documents that exist in the collection. Queries against the client-side cache don't have this performance guarantee, as that is not indexed in the same way.

From our discussion in the comments, it sounds like the client cache may have a large number of documents in it for this collection, so that might explain. When caching is enabled, I'd recommend always using an onSnapshot listener, so that you get both the local data (if that is faster) and the data from the server.

I recently described a bit how Firestore logically uses its indexes on the server, which may also provide some insight: How is the order of the fields in firestore composite indexes decided?

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807