4

Suppose I have a typical users & groups data model where a user can be in many groups and a group can have many users. It seems to me that the firebase docs recommend that I model my data by replicating user ids inside groups and group ids inside users like this:

{
  "usergroups": {
    "bob": {
      "groups": {
        "one": true,
        "two": true
       }
    },
    "fred": {
      "groups": {
        "one": true
      }
    }
  },
  "groupusers": {
    "one": {
      "users": {
        "bob": true,
        "fred": true
      }
    },
    "two": {
      "users": {
        "bob": true
      }
    }
  }
}

In order to maintain this structure, whenever my app updates one side of the relationship (e.g., adds a user to a group), it also needs to update the other side of the relationship (e.g., add the group to the user).

I'm concerned that eventually someone's computer will crash in the middle of an update or something else will go wrong and the two sides of the relationship will get out of sync. Ideally I'd like to put the updates inside a transaction so that either both sides get updated or neither side does, but as far as I can tell I can't do that with the current transaction support in firebase.

Another approach would be to use the upcoming firebase triggers to update the other side of the relationship, but triggers are not available yet and it seems like a pretty heavyweight solution to post a message to an external server just to have that server keep redundant data up to date.

So I'm thinking about another approach where the many-many user-group memberships are stored as a separate endpoint:

{
  "memberships": {
    "id1": {
      "user": "bob",
      "group": "one"
    },
    "id2": {
      "user": "bob",
      "group": "two"
    },
    "id3": {
      "user": "fred",
      "group": "one"
    }
  }
}      

I can add indexes on "user" and "group", and issue firebase queries ".orderByChild("user").equalTo(...)" and ".orderByChild("group").equalTo(...)" to determine the groups for a particular user and the users for a particular group respectively.

What are the downsides to this approach? We no longer have to maintain redundant data, so why is this not the recommended approach? Is it significantly slower than the recommended replicate-the-data approach?

Dallan Quass
  • 5,921
  • 1
  • 17
  • 8

1 Answers1

5

In the design you propose you'd always need to access three locations to show a user and her groups:

  1. the users child to determine the properties of the user
  2. the memberships to determine what groups she's a member of
  3. the groups child to determine the properties of the group

In the denormalized example from the documentation, your code would only need to access #1 and #3, since the membership information is embedded into both users and groups.

If you denormalize one step further, you'd end up storing all relevant group information for each user and all relevant user information for each group. With such a data structure, you'd only need to read a single location to show all information for a group or a user.

Redundancy is not necessarily a bad thing in a NoSQL database, indeed precisely because it speeds things up.

For the moment I would go with a secondary process that periodically scans the data and reconciles any irregular data it finds. Of course that also means that regular client code needs to be robust enough to handle such irregular data (e.g. a group that points to a user, where that user's record doesn't point to the group).

Alternatively you could set up some advanced .validate rules that ensure the two sides are always in sync. I've just always found that takes more time to implement, so never bothered.

You might also want to read this answer: Firebase data structure and url

Community
  • 1
  • 1
Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
  • 1
    Thank-you for the detailed answer! It sounds like in exchange for the additional read I don't have to write code to update both sides of the relationship, to work around inconsistencies, and to fix inconsistencies. If the additional read is the only downside, that seems like a reasonable trade-off for my use-case. I'm not sure how validation rules could be written to guarantee consistency, since one of the rules would have to be invalidated when the one side of the relationship was updated, before the other side of the relationship was updated. – Dallan Quass Mar 13 '15 at 17:16
  • 1
    In general, I'd optimize for the most common case. And often: data is *read* way more often than it is *written*. Then again: denormalizing further here might also be a case of premature optimization. In the end, only you can determine what is best for your use-case. – Frank van Puffelen Mar 13 '15 at 17:20
  • Yeah, my situation is I'm creating a platform where 3rd-party apps will someday update the data so I need to be more careful than usual. It's too bad that Firebase doesn't allow indexing multi-valued attributes, as in all of the keys of a particular object. (All of the user-ids inside each group's "users" object for example.) Then I could store the many-many relationships on one side and use an index to access from the other direction. That's been the most painful part migrating from Google Datastore (which indexes multi-valued attributes) to Firebase. Thanks for your help. – Dallan Quass Mar 13 '15 at 23:07
  • in the firebase doc here : https://firebase.google.com/docs/database/security/indexing-data it's explained that index can be put on child node. So I don't understand why your struture of "memberships" should not be good as well ???? – ThierryC Sep 05 '16 at 12:57