0

In firebase database I created a database call "orders". The data to be stored has following format:

{
    cost: orderCost,
    createdAt: createdAt,
}

In firebase, the tree is like below:

enter image description here

What I want to achieve is to quickly get the sum of cost in recent 7 days, or 30 days based on createAt timestamp.

What I think of is to create the index for createdAt, and run query based upon it.

But what if the dataset is huge, say several million entries, is it still efficient to run the above query?

Please kindly give your advise. Thanks!

damingzi
  • 676
  • 10
  • 28

2 Answers2

2

Essentially what you want is a counter/aggregation. Depending on how granular you need your summary to be and the rate of data ingestion/document writes, you would:

  1. Use a daily aggregation for the sum of cost using Cloud Function triggers
  2. If data volume is high, use a distributed counter to update the sum.

With the daily aggregated stats available, you can do any ad-hoc N days summary by querying only N documents.

If you are keen to explore other options, you can also push the data in Firebase to analytic tools such as BigQuery.

Joshua Chan
  • 1,797
  • 8
  • 16
0

Firebase Realtime Database has no built in aggregation functions.

You have a few options:

  • Load all data, and calculate the aggregates on demand.
  • Keep running aggregates in the database, and update the on every write operation.
  • Export the data from Firebase to a system that has better querying capabilities (like BigQuery or Data Flow), and then write the results back to Firebase.

This topic has been covered before, so I recommend checking out:

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
  • Thanks Frank. I also thought of 2nd approach. Let's say for 7 days aggregation, I think I can create a separate ref("7days") where it stores most recent 7 days data. If I want the sum of 7 days, I just need to go to ref("7days") to get sum them up. I heard we can do schedule functions in cloud functions. In this way, the cloud function can behave like a cron job which can push new data and pop those older than 7 days. Does it sound doable for you? Thanks! – damingzi Dec 06 '19 at 21:43
  • 1
    1) I would more likely aggregate by day and week, and then read the number of days/weeks that you need to report. That requires a fixed number of reads, while allowing a more flexible reporting model. 2) Schedulign Cloud Functions, see https://firebase.google.com/docs/functions/schedule-functions – Frank van Puffelen Dec 06 '19 at 22:15
  • I totally agree. Thanks for your advise! – damingzi Dec 06 '19 at 22:26