1

Say I have folders, documents, and comments. Understanding that we don't want nested data, there seem to be a few choices on how to store hierarchical data:

A. Encoding just the last 2 steps of the logical path in the ref:

/users/$user_key
/folders/$folder_key
/folder_documents/$folder_key/$document_key
/document_comments/$document_key/$comment_key
/comment_likes/$comment_key/$user_key

B. Encoding the full logical path in the ref, but adding path components so that client can efficiently fetch only what it needs:

/users/$user_key
/folders/info/$folder_key
/folders/documents/$folder_key/info/$document_key
/folders/documents/$folder_key/comments/$document_key/info/$comment_key
/folders/documents/$folder_key/comments/$document_key/likes/$comment_key/info/$user_key

C. Encoding no path information in ref, and adding indices to fetch subsets of data such as all the comments for a document.

/users/$user_key
/folders/$folder_key
/documents/$document_key
/comments/$comment_key
/likes/$comment_key/$user_key  (still want 2 levels here, probably)

Are there others I haven’t considered?

I am currently doing A because that's what's suggested in that link above… But I don’t like it too much because it requires knowing the parent key in order to fetch an object. Usually on the client I know this anyway, but things get messier when adding a server component to handle side-effects like notifications. For example, if someone likes a comment and we want to send a notification, the server needs to know the document key to fetch the comment text that was liked. The work-arounds seem to be replicating data or passing parent keys around.

I think B might work pretty well, but it feels a little baroque.

I recently thought of C, and wonder what the performance implications are.

Stephen Farrell
  • 327
  • 2
  • 7
  • 1
    I think C is the standard 'denormalized' way, except that `likes` would not have two levels as you indicate, but only `$like_key`, holding a single like. Each `like` would then contain information on which comment and which user, just as a `comment` contains information on which document it is for. This is at least what I think would be standard, but surely someone else will also voice their opinion. –  May 11 '15 at 13:52
  • The *best* way to structure data depends on your intended usage of that data. For example: if you only even want to show comments when a user is looking at a specific documents, I would simply store the comments under the document node. If you also want to show comments in other places, you may *either* normalize them into a single top-level node *or* replicate them in all places where you'll need to read them. Replication is very unnatural to most (classic SQL schooled) devs, but essentially optimized for read performance. But it all depends on your use-cases. – Frank van Puffelen May 11 '15 at 16:02
  • @FrankvanPuffelen Yeah so in this case the primary case is reading comments under a document node, and rarely they need to be accessed directly (w/o access to document node). I guess the other option then is to not replicate the whole comment, but instead just an index like /comment_document_idx/$comment_id with a value of $document_id, and hitting that index whenever you need to read a comment by id. That seemed like more trouble than passing parent keys around, however. The other option, I guess, is to denormalize the comment when sending the like to server for notifications... – Stephen Farrell May 11 '15 at 16:43
  • I see most people gravitating towards keeping the ids. But that is still suboptimal as far as read-performance is concerned, since it still requires an extra read for each comment. So at best you'll have O(n). If you store the comment under the documentation, you'll have O(1) read-performance at the cost of a slower write. You're probably also well-served by this answer: http://stackoverflow.com/questions/16638660/firebase-data-structure-and-url/16651115#16651115 – Frank van Puffelen May 11 '15 at 17:14
  • @FrankvanPuffelen Yeah - I'd seen that comment before... it makes total sense when you're optimizing for performance to denormalize/replicate. I probably did an insufficient job of describing my use case - it's really just that I'm doing notifications, and the server needs to look up individual comments. The idea with B was that the ref URL would always contain the full logical path, so it's maybe a cleaner way of passing all of that info from client to server. – Stephen Farrell May 11 '15 at 18:08

0 Answers0