3

Lets assume I'm trying to build a group messaging application, so I designed my database structure to look like so:

users: {
    uid1: { //A user id using push()
        username: "user1"
        email: "aaa@bbb.ccc"
        timestampJoined: 18594659346
        groups: {
            gid1: true,
            gid3: true
        }
    }
    uid2: {
        username: "user2"
        email: "ddd@eee.fff"
        timestampJoined: 34598263402
        groups: {
            gid1: true,
            gid5: true
        }
    }
    ....
}

groups: {
    gid1: { //A group id using push()
        name: "group1"
        users: {
            uid1: true,
            uid2: true
        }
    }
    gid2: {
        name: "group2"
        users: {
            uid5: true,
            uid7: true,
            uid80: true
        }
    }
    ...
}

messages: {
    gid1: {
        mid1: {  //A message id using push()
            sender: uid1
            message: "hello"
            timestamp: 12839617675
        }
        mid2: {
            sender: uid2
            message: "welcome"
            timestamp: 39653027465
        }
        ...
    }
    ...
}

According to Firebase's docs this would scale great.

Now lets assume that inside my application, I want to display the sender's username on every message.

Querying the username for every single message is obviously bad, so one of the solutions that I found was to duplicate the username in every message.

The messages node will now look like so:

messages: {
    gid1: {
        mid1: {  //A message id using push()
            sender: uid1
            username: "user1"
            message: "hello"
            timestamp: 12839617675
        }
        mid2: {
            sender: uid2
            username: "user2"
            message: "welcome"
            timestamp: 39653027465
        }
        ...
    }
    ...
}

Now I want to add the option for the user to change his username.

So if a user decides to change his username, it has to be updated in the users node, and in every single message that he ever sent.

If I would have gone with the "listener for every message" approach, then changing the username would have been easy, because I would have needed to change the name in a single location.

Now, I have to also update the name in every message of every group that he sent.

I assume that querying the entire messages node for the user id is a bad design, so I thought about creating another node that stores the locations of all the messages that a user has sent.

It will look something like this:

userMessages: {
    uid1: {
        gid1: {
            mid1: true
        }
        gid3: {
            mid6: true,
            mid12: true
        }
        ...
    }
    uid2: {
        gid1: {
            mid2: true
        }
        gid5: {
            mid13: true,
            mid25: true
        }
        ...
    }
    ...
}

So now I could quickly fetch the locations of all the messages for a specific user, and update the username with a single updateChildren() call.

Is this really the best approach? Do I really have to duplicate so much data (millions of messages) only because I'm referencing a dynamic value (the username)?

Or is there a better approach when dealing with dynamic data?

123
  • 545
  • 6
  • 12
  • 1
    Lots of good answers already. I'll just link to an answer I wrote a while ago about how to write the fanned-out/denormalized data in case you choose that route: http://stackoverflow.com/questions/30693785/how-to-write-denormalized-data-in-firebase – Frank van Puffelen Jun 04 '16 at 16:01

3 Answers3

3

This is a perfect example of why, in general, parent node names (keys) should be disassociated from the values they contain or represent.

So some big picture thinking may help and considering the user experience may provide the answer.

Now lets assume that inside my application, I want to display the sender's username on every message.

But do you really want to do that? Does your user really want to scroll through a list of 10,000 messages? Probably not. Most likely, the app is going to display a subset of those messages and even at that probably 10 or 12 at a time.

Here's some thoughts:

Assume a users table:

users
  uid_0
    name: Charles
  uid_1
    name: Larry
  uid_2:
    name: Debbie

and a messages table

messages
  msg_1
     sender: uid_1
     message: "hello"
     timestamp: 12839617675
     observers:
        uid_0: true
        uid_1: true
        uid_2: true

Each user logs in and the app performs a query that observes the messages node they are part of - the app displays displays the message text of the message as well as each users name that's also observing that message (the 'group').

This could also be used to just display the user name of the user that posted it.

Solution 1: When the app starts, load in all of the users in the users node store them in dictionary with the uid_ as the key.

When the messages node is being observed, each message is loaded and you will have the uid's of the other users (or the poster) stored in the users_dict by key so just pick their name:

let name = users_dict["uid_2"]

Solution 2:

Suppose you have a LOT of data stored in your users node (which is typical) and a thousand users. There's no point in loading all of that data when all you are interested in is their name so your could either

a) Use solution #1 and just ignore all of the other data other than the uid and name or

b) Create a separate 'names' node in firebase which only keeps the user name so you don't need to store it in the users node.

names:
  uid_0: Charles
  uid_1: Larry
  uid_2: Debbie

As you can see, even with a couple thousand users, that's a tiny bit of data to load in. And... the cool thing here is that if you add a listener to the names node, if a users changes their name the app will be notified and can update your UI accordingly.

Solution 3:

Load your names on an as needed basis. While technically you can do this, I don't recommend it:

Observe all of the messages nodes the user is part of. Those nodes will be read in and as they are read in, build a dictionary of uid's that you will need the names of. Then perform a query for each user name based on the uid. This can work but you have to take the asynchronous nature of Firebase into account and allow time for the names to be loaded in. Likewise, you could load in a message, then load in the user name for that message with the path: users/uid_x/user_name. Again though this get into an async timing issue where you are nesting async calls within async calls or a loop and that should probably be avoided.

The important point with any solution the user experience and keeping your Firebase structure as flat as possible.

For example, if you do in fact want to load 10,000 messages, consider breaking the message text or subject out into another node, and only load those nodes for your initial UI list. As the user drills down into the message, then load the rest of the data.

Jay
  • 34,438
  • 18
  • 52
  • 81
1

Steps to follow:

  • fetch username at every restart of app

  • cache them locally

  • show username from cache based on uid

  • done

Note: how you fetch username depends on your way of implementation

devprashant
  • 1,285
  • 1
  • 13
  • 23
1

You only need this structure

  mid1: {  //A message id using push()
     sender: uid1
     message: "hello"
     timestamp: 12839617675
  }

The username can be read from the users directly "users/uid1/username" using a single value event listener after you read each child. Firebase is supposed to be used with sequential calls, since you cannot create complex queries like in SQL,

And just to keep it efficient you could:

1)Create a reference dictionary to use it as a cache handler in which after you read every message you verify if you have the value for each key:

   [uid1:"John",uid2:"Peter",....etc...]

And if the key doesn't exist you add with the single value listener pointing to /users/$uid/username that handles the "add to cache" in its callback

2)Use the limitTo startAt and endAt queries to paginate the listener and avoid bringing data the user won't see

*There is no need to actually keep updating all the messages and all the nodes with every user change, imagine a chat group with 100 users in which every user have 20 messages ...2000 updates with your single updateChildren() call that would be extremely inefficient, since it is not scalable and you are updating data that surely no user will ever see again in a real life scenario (like the first message of the 2000 chat messages)

Ymmanuel
  • 2,523
  • 13
  • 12