13

What is really this denormalization all about when talking about Firebase Cloud Firestore? I read a few articles on the internet and some answers here on stackoverflow and most of the answers recommend this approach. How does this denormalization really help? Is it always necessary?

Is database flatten and denormalization the same thing?

It's my fist question and hope I'll find an answer that can help me understand the concept. I know is different, but I have two years of experience in MySQL.

Alex Mamo
  • 130,605
  • 17
  • 163
  • 193
Dave
  • 189
  • 1
  • 8
  • 1
    Denormalization typically means that you duplicate data in your database. In most NoSQL databases this makes it faster and simpler to read data, at the cost of making the write operations slower and more complex. For a read-heavy database that is a worthwhile trade-off, but it always depends on your exact use-case and requirements. – Frank van Puffelen Jan 18 '19 at 16:56
  • 1
    Covering all of this is way too broad for a Stack Overflow question, but I recommend reading [NoSQL data modeling](https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/), watching [Firebase for SQL developers](https://www.youtube.com/playlist?list=PLl-K7zZEsYLlP-k-RKFa7RyNPa9_wCH2s) and [Getting to know Cloud Firestore](https://www.youtube.com/playlist?list=PLl-K7zZEsYLluG5MCVEzXAQ7ACZBCuZgZ). – Frank van Puffelen Jan 18 '19 at 16:57
  • @FrankvanPuffelen Thanks Frank van Puffelen, certainly I will take a look at that. – Dave Jan 18 '19 at 18:28

1 Answers1

34

What is denormalization in Firebase Cloud Firestore?

The denormalization is not related only to Cloud Firestore, is a technique generally used in NoSQL databases.

What is really this denormalization?

Denormalization is the process of optimizing the performance of NoSQL databases, by adding redundant data in other different places in the database. What I mean by adding redundant data, as @FrankvanPuffelen already mentioned in his comment, it means that we copy the exact same data that already exists in one place, in another place, to suit queries that may not even be possible otherwise. So denormalization helps cover up the inefficiencies inherent in relational databases.

How does this denormalization really help?

Yes, it does. It's also a quite common practice when it comes to Firebase because data duplication is the key to faster reads. I see you're new to the NoSQL database, so for a better understanding, I recommend you see this video, Denormalization is normal with the Firebase Database. It's for Firebase realtime database but the same principles apply to Cloud Firestore.

Is it always necessary?

We don't use denormalization just for the sake of using it. We use it, only when it is definitely needed.

Is database flatten and denormalization the same thing?

Let's take an example of that. Let's assume we have a database schema for a quiz app that looks like this:

Firestore-root
    |
    --- questions (collections)
          |
          --- questionId (document)
                 |
                 --- questionId: "LongQuestionIdOne"
                 |
                 --- title: "Question Title"
                 |
                 --- tags (collections)
                      |
                      --- tagIdOne (document)
                      |     |
                      |     --- tagId: "yR8iLzdBdylFkSzg1k4K"
                      |     |
                      |     --- tagName: "History"
                      |     |
                      |     --- //Other tag properties
                      |
                      --- tagIdTwo (document)
                            |
                            --- tagId: "tUjKPoq2dylFkSzg9cFg"
                            |
                            --- tagName: "Geography"
                            |
                            --- //Other tag properties

We can flatten the database by simply moving the tags collection in a separate top-level collection like this:

Firestore-root
    |
    --- questions (collections)
    |     |
    |     --- questionId (document)
    |            |
    |            --- questionId: "LongQuestionIdOne"
    |            |
    |            --- title: "Question Title"
    |
    --- tags (collections)
          |
          --- tagIdOne (document)
          |     |
          |     --- tagId: "yR8iLzdBdylFkSzg1k4K"
          |     |
          |     --- tagName: "History"
          |     |
          |     --- questionId: "LongQuestionIdOne"
          |     |
          |     --- //Other tag properties
          |
          --- tagIdTwo (document)
                |
                --- tagId: "tUjKPoq2dylFkSzg9cFg"
                |
                --- tagName: "Geography"
                |
                --- questionId: "LongQuestionIdTwo"
                |
                --- //Other tag properties

Now, to get all the tags that correspond to a specific question, you need to simply query the tags collection where the questionId property holds the desired question id.

Or you can flatten and denormalize the database at the same time, as you can see in the following schema:

Firestore-root
    |
    --- questions (collections)
    |     |
    |     --- questionId (document)
    |            |
    |            --- questionId: "LongQuestionIdOne"
    |            |
    |            --- title: "Question Title"
    |            |
    |            --- tags (collections)
    |                 |
    |                 --- tagIdOne (document) //<----------- Same tag id
    |                 |     |
    |                 |     --- tagId: "yR8iLzdBdylFkSzg1k4K"
    |                 |     |
    |                 |     --- tagName: "History"
    |                 |     |
    |                 |     --- //Other tag properties
    |                 |
    |                 --- tagIdTwo (document) //<----------- Same tag id
    |                       |
    |                       --- tagId: "tUjKPoq2dylFkSzg9cFg"
    |                       |
    |                       --- tagName: "Geography"
    |                       |
    |                       --- //Other tag properties
    |
    --- tags (collections)
          |
          --- tagIdOne (document) //<----------- Same tag id
          |     |
          |     --- tagId: "yR8iLzdBdylFkSzg1k4K"
          |     |
          |     --- tagName: "History"
          |     |
          |     --- questionId: "LongQuestionIdOne"
          |     |
          |     --- //Other tag properties
          |
          --- tagIdTwo (document) //<----------- Same tag id
                |
                --- tagId: "tUjKPoq2dylFkSzg9cFg"
                |
                --- tagName: "Geography"
                |
                --- questionId: "LongQuestionIdTwo"
                |
                --- //Other tag properties

See, the tag objects are the same as well in users -> uid -> tags -> tagId as in tags -> tagId. So we flatten data to group somehow existing data.

For more information, you can also take a look at:

Because you say you have a SQL background, try to think at a normalized design which will often store different but related pieces of data in separate logical tables, which are called relations. If these relations are stored physically as separate disk files, completing a query that draws information from several relations (join operations) can be slow. If many relations are joined, it may be prohibitively slow. Because in NoSQL databases, we do not have "JOIN" clauses, we have to create different workarounds to get the same behavior.

Alex Mamo
  • 130,605
  • 17
  • 163
  • 193
  • How would you query this structure to get all questions that has tags: "yR8iLzdBdylFkSzg1k4K" && "tUjKPoq2dylFkSzg9cFg"? – 1110 Jul 21 '22 at 23:19
  • @1110 You cannot do that. We are usually structuring a Firestore database according to the queries that we want to perform. If you have a specific use case, please post a new question, here on StackOverflow, using its own [MCVE](https://stackoverflow.com/help/mcve), so I and other Firebase developers can help you. – Alex Mamo Jul 22 '22 at 06:21