0

Does anyone have good ideas about what kind of a data model makes sense in Firestore for time dependent data?

I have a lot of event data that I would like to store in Firestore and then run an analysis on it

Each event has a timestamp and I would like to run the aggregated analysis for example for 1 day of data, 7 day of data, X days of data, 1 month, X months, etc

How should this be setup in firestore, 7 days of event data is already a lot of data that and I can't return it to the client and make the analysis there. If I aggregate some predefined set of days beforehand in firestore it is then locked to only those days and you can't choose an arbitrary amount of days. I would also need to keep updating the aggregated data every time there is new data

Any help much appreciated!

Alex Mamo
  • 130,605
  • 17
  • 163
  • 193
jani_r
  • 637
  • 1
  • 7
  • 18
  • In the NoSQL world, we are usually structuring a database according to the queries that we want to perform. What are those queries? – Alex Mamo Nov 24 '22 at 07:46
  • For example there could be query like this: `SELECT hits, COUNT(*) FROM event_type_api GROUP BY hits WHERE start_date > TODAY- X` the X could be any amount. The bigger X is the bigger the result set is. It would of course be possible to pre-aggregate the result and store in firestore for example for 7, 14, 30 days but how is that done if you can choose any X you want (of course with some upper limit) I don't have the firestore queries ready but that is the general requirement – jani_r Nov 24 '22 at 11:18
  • So you want to get the result and the cont in a single go? – Alex Mamo Nov 24 '22 at 11:33
  • Preferably yes, if it is possible to aggregate directly in firestore like with a relational database then that would be the best option. I guess there is no group by in firestore. Currently the data is stored in a collection per event type (one event type is API). The collection has millions of docs (each doc has a timestamp and each doc is part of the count operation and they should be grouped by API_TYPE, METHOD and API_FAMILY) So multiple group bys – jani_r Nov 24 '22 at 12:06
  • Are you allowed to change the database schema? If yes, would such an answer help? – Alex Mamo Nov 24 '22 at 12:34
  • Yes, it is possible. This is initial development so everything is open. The events are now in Cloud SQL (mysql) and each event is it's own table and db size keeps growing pretty fast. I guess there is some upper limit to the db also.... I was looking into BigQuery (or Cloud Spanner), maybe one of those would be easier. I am also more familiar with SQL queries. – jani_r Nov 24 '22 at 12:50

1 Answers1

2

As I understand you're looking to perform a query similar to:

SELECT hits, COUNT(*)  FROM event_type_api  GROUP BY hits WHERE start_date > TODAY - X

Firestore is a NoSQL database, but that doesn't mean that you cannot know the number of documents in a query. You cannot in SQL terms, but you can count them. It's a little costly to read all documents in a collection to only have the number of documents. That's why you need to call count(). As you already mentioned, there is also no "GROUP BY" present in Firestore. However, we can achieve almost the same thing.

Assuming that you'll create a collection called "hits" in which you store documents that have a field of type timestamp, then you can perform the following query:

val queryByTimestamp = db.collection("hits").whereGreaterThan("timestamp", TODAY - X)

If you want to know how many documents the query returns, you need to call count() like this:

val numberOfDocuments = queryByTimestamp.count()

The last thing is related to grouping. As mentioned before, Firestore doesn't offer any aggregation queries such as grouping. However, there are two solutions for this use case. You can get all the documents in the hits collection, and then group them in your application code by whatever fields you need. Or you can create separate collections of pre-grouped documents to satisfy your needs. The latter is the recommended approach. And remember, that duplicating data is a quite common practice when it comes to NoSQL databases.

Alex Mamo
  • 130,605
  • 17
  • 163
  • 193