60

I'm new in Firebase and nosql so bear with me to use reference to sql. So my question is how to structure the data in firebase?

In firebase, is that mean every "new firebase" = "new Database" or "table" in mysql?

If in my real time web app, I have users and comments. In mysql, I will create a users and a comments table then link them together.

How do I structure this in firebase?

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
vzhen
  • 11,137
  • 13
  • 56
  • 87
  • 6
    It seems that I will be first who point you to this URL: https://www.firebase.com/blog/2013-04-12-denormalizing-is-normal.html :) – kostik May 19 '13 at 20:29
  • @kostik by the way, do you have more info about this? I meant article or tutorial about a newbie coming from relational db background to nosql. – vzhen May 19 '13 at 20:45
  • I am newbie too :) I spent yesterday in order to ask this question http://stackoverflow.com/questions/16628362/pagination-in-application-which-use-firebase-com-as-database which, I think, will be unanswered. So bear in mind that firebase can be good not for all cases – kostik May 19 '13 at 21:33
  • Hi @vzhen, as kostik says Firebase is not a SQL database. It is more comparably to a hierarchical database or (maybe a slightly easier metaphor) inserting elements into an XML document. I suggest reading the responses to kostik's earlier question over here: http://stackoverflow.com/a/16423051/209103. – Frank van Puffelen May 20 '13 at 00:58

1 Answers1

158

If you have users and comments, you could easily model it like this:

ROOT
 |
 +-- vzhen
 |     |
 |     +-- Vzhen's comment 1
 |     |
 |     +-- Vzhen's comment 2
 |
 +-- Frank van Puffelen
       |
       +-- Frank's comment 1
       |
       +-- Frank's comment 2

However it is more likely that there is a third entity, like an article, and that users are commenting on (each other's) articles.

Firebase doesn't have the concept of a foreign key, but it's easy to mimic it. If you do that, you can model the user/article/comment structure like this:

ROOT
 |
 +-- ARTICLES
 |     |
 |     +-- Text of article 1 (AID=1)
 |     |
 |     +-- Text of article 2 (AID=2)
 |
 +-- USERS
 |     |
 |     +-- vzhen (UID=1056201)
 |     |
 |     +-- Frank van Puffelen (UID=209103)
 |
 +-- COMMENTS
 |     |
 |     +-- Vzhen's comment on Article 1 (CID=1)
 |     |
 |     +-- Frank's response (CID=2)
 |     |
 |     +-- Frank's comment on article 2 (AID=2,UID=209103)
 |
 +-- ARTICLE_USER_COMMENT
       |
       +-- (AID=1,UID=1056201,CID=1)
       |
       +-- (AID=1,UID=209103,CID=2)
       |
       +-- (AID=2,UID=209103,CID=3)
 

This is a quite direct mapping of the way you'd model this in a relational database. The main problem with this model is the number of lookups you'll need to do to get the information you need for a single screen.

  1. Read the article itself (from the ARTICLES node)
  2. Read the information about the comments (from the ARTICLE_USER_COMMENT node)
  3. Read the content of the comments (from the COMMENTS node)

Depending on your needs, you might even need to also read the USERS node.

And keep in mind that Firebase does not have the concept of a WHERE clause that allows you to select just the elements from ARTICLE_USER_COMMENT that match a specific article, or a specific user.

In practice this way of mapping the structure is not usable. Firebase is a hierarchical data structure, so we should use the unique abilities that gives us over the more traditional relational model. For example: we don't need a ARTICLE_USER_COMMENT node, we can just keep this information directly under each article, user and comment itself.

A small snippet of this:

ROOT
 |
 +-- ARTICLES
 |     |
 |     +-- Text of article 1 (AID=1)
 |     .    |
 |     .    +-- (CID=1,UID=1056201)
 |     .    |
 |          +-- (CID=2,UID=209103)
 |
 +-- USERS
 |     |
 |     +-- vzhen (UID=1056201)
 |     .    |
 |     .    +-- (AID=1,CID=1)
 |     .    
 |
 +-- COMMENTS
       |
       +-- Vzhen's comment on Article 1 (CID=1)
       |
       +-- Frank's response (CID=2)
       |
       +-- Frank's comment on article 2 (CID=3)

You can see here, that we're spreading the information from ARTICLE_USER_COMMENT over the article and user nodes. This is denormalizing the data a bit. The result is that we'll need to update multiple nodes when a user adds a comment to an article. In the example above we'd have to add the comment itself and then the nodes to the relevant user node and article node. The advantage is that we have fewer nodes to read when we need to display the data.

If you take this denormalization to its most extreme, you end up with a data structure like this:

ROOT
 |
 +-- ARTICLES
 |     |
 |     +-- Text of article 1 (AID=1)
 |     |    |
 |     |    +-- Vzhen's comment on Article 1 (UID=1056201)
 |     |    |
 |     |    +-- Frank's response (UID=209103)
 |     |
 |     +-- Text of article 2 (AID=2)
 |          |
 |          +-- Frank's comment on Article 2 (UID=209103)
 |
 +-- USERS
       |
       +-- vzhen (UID=1056201)
       |    |
       |    +-- Vzhen's comment on Article 1 (AID=1)
       |
       +-- Frank van Puffelen (UID=209103)
            |
            +-- Frank's response (AID=1)
            |
            +-- Frank's comment on Article 2 (AID=2)
  

You can see that we got rid of the COMMENTS and ARTICLE_USER_COMMENT nodes in this last example. All the information about an article is now stored directly under the article node itself, including the comments on that article (with a "link" to the user who made the comment). And all the information about a user is now stored under that user's node, including the comments that user made (with a "link" to the article that the comment is about).

The only thing that is still tricky about this model is the fact that Firebase doesn't have an API to traverse such "links", so you will have to look up the user/article up yourself. This becomes a lot easier if you use the UID/AID (in this example) as the name of the node that identifies the user/article.

So that leads to our final model:

ROOT
 |
 +-- ARTICLES
 |     |
 |     +-- AID_1
 |     |    |
 |     |    +-- Text of article 1
 |     |    |
 |     |    +-- COMMENTS
 |     |         |
 |     |         +-- Vzhen's comment on Article 1 (UID=1056201)
 |     |         |
 |     |         +-- Frank's response (UID=209103)
 |     |
 |     +-- AID_2
 |          |
 |          +-- Text of article 2
 |          |
 |          +-- COMMENTS
 |               |
 |               +-- Frank's comment on Article 2 (UID=209103)
 |
 +-- USERS
       |
       +-- UID_1056201
       |    |
       |    +-- vzhen
       |    |
       |    +-- COMMENTS
       |         |
       |         +-- Vzhen's comment on Article 1 (AID=1)
       |
       +-- UID_209103
            |
            +-- Frank van Puffelen
            |
            +-- COMMENTS
                 |
                 +-- Frank's response (AID=1)
                 |
                 +-- Frank's comment on Article 2 (AID=2)

I hope this helps in understanding hierarchical data-modelling and the trade-offs involved.

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
  • Great answer, thank you. I have some questions here. in the final model, you mean if `vzhen` comment on `article(AID_1)` then I have to insert the comment into two nodes `(articles and users)` right? So will this become duplicate data?
    another question is, if I want to find out all articles post by vzhen, that mean I have to do the same thing like comments create an `articles` sub-node under vzhen_uid?
    – vzhen May 20 '13 at 21:42
  • 3
    can I say like this? In relational db, we create spaces to store our data then use `JOIN` to get how the data look like. In hierarchical db, we have to know how the data look like before we create spaces for them. – vzhen May 20 '13 at 21:45
  • 2
    @vzhen: indeed you will have to insert the comment in multiple places. Denormalizing the data like this, makes your inserts/updates more complex (and typically slower). It simplifies your reads however, so it's a useful tactic if you expect the volume of reads to drastically outgrow the number of writes. – Frank van Puffelen May 21 '13 at 11:43
  • @vzhen: I would typically not duplicate replicate articles, but instead do a look up. But since there is no JOIN mechanism, such a lookup will have to be done from the client either by loading the root ARTICLES node or by doing multiple lookups by AID. – Frank van Puffelen May 21 '13 at 11:44
  • 1
    Firebase has some guidance in their docs that may help if you're learning to structure your data: https://www.firebase.com/docs/web/guide/understanding-data.html and https://www.firebase.com/docs/web/guide/structuring-data.html – mimming Sep 12 '14 at 18:31
  • 1
    And when you need to load user information for each comment, would you do another read on each comment to fetch the user latest node data? For instance if the user changes their name. – kabuto178 Oct 24 '16 at 19:10
  • In the final model, wouldn't this mean that whenever you retrieve the articles or users, you would have to retrieve all comments as well? Wouldn't it be better for the comments to be in a different node, so that the comments could be fetched only when necessary, and pagination can also be applied? – Justin Leo Jul 15 '17 at 12:35