1

I have an android app using firebase Firestore, and I need to run a calculation on a large (>1000) number of documents. Normally in a query response a database would specify some metadata to indicate if the query had returned all the results, or just a portion with pagination token used to retrieve the result with through subsequent requests.

It appears from the Firestore documentation that there is none of that, only that that the query results are limited to 10mb for a single query. How is this enforced? Will it return a client error? Will it just send back a truncated result? Do I need to apply my own limit, and then how can I know what a reasonable limit is when I don't know the exact size of these documents?

My question is basically how to run a query for a large set of documents that is likely to exceed the single query limit?

rosghub
  • 8,924
  • 4
  • 24
  • 37
  • "the query results are limited to 10mb for a single query." Can you show precisely what limit you are referring to here? A link to the relevant document, and an exact quote from there are probably the best way to ensure we're all working from the same information. – Frank van Puffelen Jun 28 '21 at 23:45
  • Yes [this is from the usage and limits page](https://firebase.google.com/docs/firestore/quotas) "Maximum API request size". Although not sure if that means incoming or outgoing, I took it to mean both, or that there is no limit on the number of documents. – rosghub Jun 29 '21 at 00:02
  • @FrankvanPuffelen According to the page [Order and limit data](https://firebase.google.com/docs/firestore/query-data/order-limit-data) it says there is no limit to queries by default. All this leads me to believe there is no limit enforced. So 1) say I accidentally request too many documents in the android client. What happens? or 2) assuming we should be applying a limit manually and paginating each query, how do I figure out an efficient limit when I don't know the exact document size? – rosghub Jun 29 '21 at 01:12
  • If you execute a query, Firestore returns the documents that you query for. If you want to limit how many documents are returns, specify a `limit()` clause on the query. The `limit()` is in number of documents. – Frank van Puffelen Jun 29 '21 at 01:20
  • What is the exact calculation you need to perform? – Alex Mamo Jun 29 '21 at 08:41
  • @AlexMamo The collection contains documents where each document is an invoice with a date. The user selects a custom date range (start and end) and the app calculates revenue for that time period using a property on each invoice. – rosghub Jun 29 '21 at 16:09
  • To understand better, basically, you need to sum the value of a specific property that exists within each document, between some dates, right? – Alex Mamo Jun 29 '21 at 16:13
  • @AlexMamo Yes exactly. [The page on aggregate query's](https://firebase.google.com/docs/firestore/solutions/aggregation) doesn't apply to this instance, since range of dates is input by the user and can be changed – rosghub Jun 29 '21 at 16:15

2 Answers2

2

How to run a query for a large set of documents that is likely to exceed the single query limit?

The best solution I can think of is to create an alternative structure and duplicate some data. Since a query of 1000 results might exceed the limit, you should consider creating another collection that looks like this:

Firebase-root
  |
  --- revenues (collection)
       |
       --- May 2021 (document)
       |    |
       |    --- [{May 17, 2021 at 7:51:42 PM UTC+3: 123.33}, 
       |         {May 18, 2021 at 3:36:11 PM UTC+3: 444.74}]
       |
       --- June 2021 (document)
            |
            --- [{June 11, 2021 at 3:12:22 PM UTC+3: 523.18}, 
                 {June 23, 2021 at 2:39:54 PM UTC+3: 253.14}]

As you can see, I have created a new collection called "revenues" that contains a document for each month of the year. Each document contains an array, that contains in terms key values pairs, where the key is the date and the value is the total amount of the invoice. As you can see, in my example above I have added in each month only two invoices. However, if you'll only store the above data (date/amount), you'll be able to store even much more than 1000 invoices within a single array, meaning that you'll be able to stay below the 1MiB max document size limit.

Since the user selects the start and the end date, you'll always know which document to start with and which to document to end with. So in this solution, you'll calculate the revenue on the client. For example, if you want to calculate the entire revenue from May 17th, 2021 to June 23th 2021, you have to read both documents and sum only the elements that exist between those dates. Instead of reading 1000 documents to get the total revenue, which in my opinion is a little costly, you'll only need to read two documents, assuming that a document might hold 1000 invoices.

This practice is also called denormalization and is a common practice when it comes to Firebase. If you are new to NoSQL databases, I recommend you see this video, Denormalization is normal with the Firebase Database for a better understanding. It is for Firebase Realtime Database but the same rules apply to Cloud Firestore.

Also, when you are duplicating data, there is one thing that you need to keep in mind. In the same way, you are adding data, you need to maintain it. In other words, if you want to update/delete an invoice, you need to do it in every place that it exists.

For more information please also see my answer from the following post:

If you want to directly map an array of objects from Cloud to a List of custom objects, please check my article from the following URL:

If you also want to implement pagination, please check my answer from the following post:

In which you can paginate queries by combining query cursors with the limit() method. It's some kind of old but I also recommend you take a look at this video for a better understanding.

If you need pagination on button click, please my answer below:

Alex Mamo
  • 130,605
  • 17
  • 163
  • 193
  • That will work perfectly thank you for this! Is there a reason you used an array within the document opposed to a map? or opposed to just making the dates fields on the document itself? – rosghub Jun 30 '21 at 15:59
  • (wanted to add to above) it seems like if the dates were fields in the document itself, you could update an amount without having to read the document with a set & merge or update call, or am I missing something – rosghub Jun 30 '21 at 20:32
  • Not at all. You can use whether an array or a Map, it doesn't matter, as long as it is easier for you to read and maintain and the documents stay below the max limit. If you need to update the data, yes, it might be easier to have them as fields. But without knowing all info about your use case, it's hard to say if one is better than the other. However, this approach will much cheaper than reading all 1000 documents, for sure. – Alex Mamo Jul 01 '21 at 06:52
1

You'll typically want to retrieve at most the number of document that an average user is likely to see. If typically measure this in "screenfulls", so I determine how many documents fit on the screen, and then retrieve 2 or 3 times that number of documents initially.

Then you can implement either load-on-demand/endless-scroll where you automatically load more documents as the user approaches the end of the data you already have, you can add an explicit "load more" button for them to trigger the loading, or you can tell them there there may be more documents, and that they should specify more specific criteria for what they're looking for.

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
  • So for aggregate calculations [like in this page](https://firebase.google.com/docs/firestore/solutions/aggregation), the reason the strategy in that document wont work is because the range of documents is variable (consider a collection of financial reports, the user selects a specific range by date, to see the totals for the selected range). So it sounds like the best way is to fetch all the documents I need and limit queries by "screenfulls"? – rosghub Jun 29 '21 at 01:45