38

I have a subcollection for each doc in the users collection of my app. This subcollection stores docs that are related to the user, however they could just as well be saved to a master collection, each doc with an associated userId.

I chose this structure as it seemed the most obvious at the time but I can imagine it will make things harder down the road if I need to do database maintenance. E.g. If I wanted to clean up those docs, I would have to query each user and then each users docs, whereas if I had a master collection I could just query all docs.

That lead me to question what is the point of subcollections at all, if you can just associate those docs with an ID. Is it solely there so that you can expand if your doc becomes close to the 1MB limit?

HJo
  • 1,902
  • 1
  • 19
  • 30
  • 4
    One possible drawback of saving those docs under sub-collections (instead of a master collection) is that you cannot query across several sub-collections. – Renaud Tarnec Jan 19 '19 at 11:04
  • 8
    Right, that's what I was saying. I'm not considering moving to subcollections, I'm considering moving away from them. I'm wondering whether there's actually a good reason to be using subcollections in the first place – HJo Jan 19 '19 at 11:28

4 Answers4

41

Edit: October, 29th 2021:

To be clear about the following sentence that exists in the docs:

If you don't query based on the field with sequential values.

A timestamp just can not be considered consecutive. However, it still can be considered sequential. The same rules apply to alphabetical (Customer1, Customer2, Customer3, ...), or pretty much everything that can be treated as a predictably generated value.

Such sequential data in the Firestore indexes, it's most likely to be written in the physical proximity on the storage media, hence that limitation.

That being said, please note that Firestore uses a mechanism to map the documents to their corresponding locations. This means that if the values are not randomly distributed, the write operations will not be distributed correctly over the locations. That's the reason why that limitation exists.

Also note, that there is a physical limit on how much data you can write to such a location in a specific amount of time. Predictable key/values most likely will end up in the same location, which is actually bad. So there are more changes to reach the limitation.


Edit: July, 16th 2021:

Since this answer sounds a little old, I will try to add a few more advantages of using subcollections that I found over time:

  1. Subcollections will always give you a more structured database schema, as you can always refer to a subcollection that is related only to a specific document. So you can nest only data that is related to a particular document.
  2. As mention before, the maximum depth of a subcollection is 100. So an important feature here is that a Firestore Query is as fast at level 1, as it is at level 100. So there should be no concerns regarding depth. This feature is tested.
  3. Queries in subcollections are indexed by default, as in the case of top-level collections.
  4. In terms of speed, it doesn't really matter if you Query a top-level collection, a subcollection, or a collection group, the speed will always be the same, as long as the Query returns the same number of documents. This is happening because the Query performance depends on the number of documents you request and not on the number of documents you search. So querying a subcollection has the same effect as querying a top-level collection, no downsides at all.
  5. When storing documents in a subcollection, please note that there is no need to storing the document ID as a field, as it is by default part of the reference. This means that you can store less data in the documents that exist in the subcollection. More important, if you would have saved the same data in a top-level collection, and you would have needed to create a Query with two whereEqualTo() calls + an orderBy() call, then an index would be required.
  6. In terms of security, subcollections allow inheritance of security rules, which is useful because we can write less and less code to secure the database.

That's for the moment, if I found other benefits, I'll update the answer.


Let's take an example for that. Let's assume we have a database schema for a quiz app that looks like this:

Firestore-root
    |
    --- questions (collections)
          |
          --- questionId (document)
                 |
                 --- questionId: "LongQuestionIdOne"
                 |
                 --- title: "Question Title"
                 |
                 --- tags (collections)
                      |
                      --- tagIdOne (document)
                      |     |
                      |     --- tagId: "yR8iLzdBdylFkSzg1k4K"
                      |     |
                      |     --- tagName: "History"
                      |     |
                      |     --- //Other tag properties
                      |
                      --- tagIdTwo (document)
                            |
                            --- tagId: "tUjKPoq2dylFkSzg9cFg"
                            |
                            --- tagName: "Geography"
                            |
                            --- //Other tag properties

In which tags is a subcollection within questionId object. Let's create now the tags collection as a top-level collection like this:

Firestore-root
    |
    --- questions (collections)
    |     |
    |     --- questionId (document)
    |            |
    |            --- questionId: "LongQuestionIdOne"
    |            |
    |            --- title: "Question Title"
    |
    --- tags (collections)
          |
          --- tagIdOne (document)
          |     |
          |     --- tagId: "yR8iLzdBdylFkSzg1k4K"
          |     |
          |     --- tagName: "History"
          |     |
          |     --- questionId: "LongQuestionIdOne"
          |     |
          |     --- //Other tag properties
          |
          --- tagIdTwo (document)
                |
                --- tagId: "tUjKPoq2dylFkSzg9cFg"
                |
                --- tagName: "Geography"
                |
                --- questionId: "LongQuestionIdTwo"
                |
                --- //Other tag properties

The differences between this two approaches are:

  • If you want to query the database to get all tags of a particular question, using the first schema it's very easy because only a CollectionReference is needed (questions -> questionId -> tags). To achieve the same thing using the second schema, instead of a CollectionReference, a Query is needed, which means that you need to query the entire tags collection to get only the tags that correspond to a single question.
  • Using the first schema everything is more organised. Beside that, in Firestore Maximum depth of subcollections: 100. So you can take advantage of that.
  • As also @RenaudTarnec mentioned in his comment, queries in Cloud Firestore are shallow, they only get documents from the collection that the query is run against. There is no way to get documents from a top-level collection and other collections or subcollections in a single query. Firestore doesn't support queries across different collections in one go. A single query may only use properties of documents in a single collection. So there is no way you can get all the tags of all the questions using the first schema.

This technique is called database flatten and is a quite common practice when it comes to Firebase. So use this technique only if is needed. So in your case, if you only need to display the tags of a single question, use the first schema. If you want somehow to display all the tags of all questions, the second schema is recommended.

Is it solely there so that you can expand if your doc becomes close to the 1MB limit?

If you have a subcollection of objects within a document, please note that size of the subcollection it does not count in that 1 MiB limit. Only the data that is stored in the properties of the document is counted.

Edit Oct 01 2019:

According to @ShahoodulHassan comment:

So there is no way you can get all the tags of all the questions using the first schema?

Actually now there is, we can get all tags of all questions with the use of Firestore collection group query. One thing to note is that all the subcolletions must have the same name, for instance tags.

Alex Mamo
  • 130,605
  • 17
  • 163
  • 193
  • So essentially it boils down to requesting a subcollection is faster, whereas querying a master collection is more flexible – HJo Jan 20 '19 at 02:11
  • @HamishJohnson it is the same speed. It's totally up to you how you want to do it. I prefer the second choice just in case I want to use tags on something else later on. However, it does add up to the pricing. – Eray T Jul 24 '19 at 18:36
  • 3
    @Alex Mamo `So there is no way you can get all the tags of all the questions using the first schema.` Won't a collection group query on collections named 'tags' solve this issue now? – Shahood ul Hassan Oct 01 '19 at 08:52
  • 1
    @ShahoodulHassan As a matter affect there is, using [firestore collection group query](https://firebase.google.com/docs/firestore/query-data/queries#collection-group-query). Note that all subcollections should have the same name. I will update my answer. – Alex Mamo Oct 01 '19 at 09:00
  • @ErayT so you are saying its cheaper (less reads) to do a subcollection versus a root collection? – MobileMon Jan 03 '20 at 13:54
  • @MobileMon Yes. There was a way to get subcollections included in the root collection you are calling. I was taking a course on fireship.io. However, I am not sure as of now. I switched to nodejs instead. – Eray T Jan 05 '20 at 14:26
  • 2
    @MobileMon No, there's no difference in cost. You only pay for the docs that you receive from the query. If a subcollection group query returns the same amount of docs as a master collection, it costs the same. – HJo May 22 '20 at 10:36
  • I have also updated my answer. I hope you reconsider my answer and accept it, as the only benefit of using subcollection is not that it contains their own rate limit, clearly. – Alex Mamo Jul 28 '21 at 06:14
  • @AlexMamo add geg's answer and I'll accept it. The way I see it, his answer is the only one that's truly insurmountable with anything other than subcollections. The points you made all have trivial solutions, plus 2 & 4 aren't really benefits per say – HJo Oct 25 '21 at 17:17
  • I recommend fully reading his answer. A chat messaging application will always need a timestamp with sequentially increasing values. They even give you an example in the article you shared. – HJo Oct 26 '21 at 18:39
  • @AlexMamo Alex, monotonically increasing != consecutive. Monotonically increasing means that a value only ever increases, and does not decrease - not that it increases by the same amount each time. In their documentation, it's montonically increasing functions that cause hotspots – HJo Oct 26 '21 at 20:14
  • yes, and we are talking about firestore – HJo Oct 27 '21 at 15:13
  • "Be aware that indexing fields with monotonically increasing values, such as timestamps, can lead to hotspots which impact latency for applications with high read and write rates."... They even say 'like timestamps'. "In an IoT use case with a high write rate, for example, a collection containing documents with a timestamp field might approach the 500 writes per second limit." This could not be more clear – HJo Oct 28 '21 at 09:53
  • What? A timestamp is a monotonically increasing value. If you have a document with { timesent: Timestamp } indexed, it is monotonically increasing, therefore it must be a hotspot. – HJo Oct 28 '21 at 20:58
  • The Timestamp goes up everytime a new document is added, therefore it is montonically increasing. This conversation is hurting my head I can't tell if I'm completely missing something obvious – HJo Oct 28 '21 at 21:04
  • I think you are mistaken. This line proves it "If you don't query based on the field with sequential values, you can exempt the field from indexing to bypass this limit" If you had increasing fields, you wouldn't be able to exempt a field since the field would constantly be changing. They are talking about the value, not the field – HJo Oct 29 '21 at 14:07
  • I have just deleted all recent comments, so they cannot be moved to the chat section. I have also updated my answer with a conclusion about what we have argued. Hope it's clear enough now. – Alex Mamo Oct 29 '21 at 15:25
  • FYI @AlexMamo security rules are not inherited by nested collections. See the docs: https://firebase.google.com/docs/firestore/security/rules-structure#hierarchical_data – Will Madden Jan 18 '23 at 12:26
6

The single biggest advantage of sub-collections that I've found is that they have their own rate limit for writes because each sub-collection has its own index (assuming you don't have a collection group index). This probably isn't a concern for small applications but for medium/large scale apps it could be very important.

Imagine a chat application where each chat has a series of messages. You'll want to index messages by timestamp to show them in chronological order. The Firestore write limit for sequential values is 500/second, which is definitely within reach of a medium-sized app (especially if you consider the possibility of a rogue user scripting messages -- which is not currently easy to prevent with Security Rules)

// root collection

/messages {
  chatId: string
  timeSent: timestamp // the entire app would be limited to 500/second
}
// sub-collection

/chat/{chatId}/messages {
  timeSent: timestamp // each chat could safely write up to 500/second
}
geg
  • 4,399
  • 4
  • 34
  • 35
  • 1
    That's true, but on the flipside, if you want to find a specific tag, you'll need to store both the id of the tag and the id(s) of the root collection(s). – HJo Jul 14 '21 at 08:16
  • Ya in hindsight my answer is pretty bad. I've changed my tune on this since I originally wrote it so I'll update it. There are some important advantages to sub-collections at scale – geg Jul 15 '21 at 14:48
  • 1
    I look forward to hearing it, as I've become less enthused with subcollections as time's gone on to the point where I regret having ever used them. Would be good to hear another perspective – HJo Jul 15 '21 at 20:29
  • Excellent point! It's something I've come up against in the past. And yes it does look like indexing it as a collection group mucks this up. Cheers – HJo Jul 16 '21 at 10:36
3

Subcollections are also helpful in setting up security rules. Suppose you are building a chat app and have a user collection with a replies subcollection. You want other users to be able to add to the replies collection but want to give the user full rights to the user collection. If you have replies as an array of maps/objects in user collection, it severely limits the rules you can write against the user collection for the collection owner and other users to be able to add to the collection. Whereas, having it as its own subcollection makes writing security rules waaaaay easier.

Kamal
  • 383
  • 1
  • 6
  • 16
  • 1
    This question is about having subcollection as opposed to master collection. Not about having subcollection as opposed to map/array – HJo Feb 07 '22 at 19:11
  • My reply is still valid since the map/array will reside in the master collection and therefore make it harder to write security rules against those values in the master collection whereas having them in a subcollections simplifies security rules. Sorry if you don't see that or didn't find my answer helpful. Hopefully, it will be beneficial to others who stumble upon this. Cheers. – Kamal Feb 08 '22 at 04:01
2

Surprised this hasn't been mentioned before, but sub-collections can (in some cases) help bypass the orderBy limitations:

You can't order your query by a field included in an equality (==) or in clause.

Suppose you want to get a users most recent 10 logins:

Top-Level:

//We can't use .orderBy after .where('==')
USER_LOGINS.where('userId', '==', {uid}).limit(10) 

Sub-Collection:

//With a subcollection we can order and limit properly
USERS.doc({uid}).collection('LOGINS').orderBy('unixCreated', 'desc').limit(10);
kmoney12
  • 4,413
  • 5
  • 37
  • 59
  • Are you sure? From the way that line is written, it sounds like you can't order by the field in the where clause. So you couldn't order by userId (and I don't know why you would want to), but you could order by unixCreated – HJo Jul 13 '21 at 23:48