Swift Firebase "Fan Out" Technique vs queryLimited Efficiency

Question

I have a group chat feature in my app that has its messages node structured like this. Currently, it doesn't use the fan-out technique. It just lists all of the messages under the group name e.g. "group1"

groups: {
    group1: {
      -MEt4K5xhsYL33anhXpP: {
          fromUid: "diidssm......."
          userImage: "https://firebasestorage..."
          text: "hello"
          date: 1617919946
          emojis: {
              "heart": 2
              "like": 1
          }
      }
      -MEt8BLP2yMEUMPbG2zV: {
          ...
      }
      -MF-Grpl8Jchxpbn2mxH: {
          ...
      }
      -MF-OUjWXsFh7lBPosMf: {
          ...
      }
    }
}

I first observe the most recent 40 messages and observe whether new children get added as such

ref = Database.database().reference().child("groups").child("group1")
ref.queryLimited(toLast: 40).observe(.childAdded, with: { (snapshot) in
    ...
    //add to messages array to load collection view
    //for each message observe emojis and update emojis to reflect changes e.g. +1 like

    ref.child("emojis").observe(.value, with: { (snapshot) in
        ...
    })
})

Every time the user scrolls up I load another 40 messages (and observe the emojis child under each of those message nodes) using the last date (and index by date in security rules) as such

ref.queryOrdered(byChild: "date").queryEnding(beforeValue: prevdate, childKey: messageId).queryLimited(toLast: 40).observeSingleEvent(of: .value, with: { (snapshot) in

I understand the fan-out technique is used to get less information per synchronization. If I attach a listener to the groups/groupname/ to get a list of all messages for that group, I will also ask for all the info of each and every message under that node. With the fan out approach I can also just ask for the message information of the 40 most recent messages and the next 40 per scroll up using the keys of the messages from another node like this.

allGroups: {
    group1: {
      -MEt4K5xhsYL33anhXpP: 1
      -MEt8BLP2yMEUMPbG2zV: 1
      -MF-Grpl8Jchxpbn2mxH: 1
      -MF-OUjWXsFh7lBPosMf: 1
    }
}

However, if I am using queryLimited(toLast: 40) is the fan-out approach beneficial or even necessary? Wouldn't this fix the problem of "I will also ask for all the info of each and every message under that node"?

In terms of checking for new messages, I just check using .childAdded in the first code above (ref.queryLimited(toLast: 40).observe(.childAdded)). According to the post below, queryLimited(toLast: 40) will sync the last 40 child nodes, and keep synchronizing those (removing previous ones as new ones are added).

Some questions about keepSynced(true) on a limited query reference

I'm assuming if group1 had 1000 messages, with this approach I am just reading the 40 most recent messages I need and the next 40 per scroll, thus ignoring the other several hundred. Why would I use the fan-out technique then? May be I'm not understanding something fundamental about limited queries.

Side Question: Should I be including references to profile images under each message node? Is it bad to do this in terms of cloud storage and realtime database storage? Ideally there would be hundreds of groupchats.

Fanning out data can take many forms. Can you edit your question to also show the JSON you have in mind when you say "the fan-out technique"? — Frank van Puffelen, Apr 23 '21 at 00:45
I'm not sure I understand how this new data structure relates to the one at the top of your question. It's a good idea to have such a list of group IDs, if you then load the data for specific groups, but not for others, in a way that can't be handled with a query. Since queries work for you, I don't immediately see the need for the destructured data, but that might be because I don't now all your use-cases. — Frank van Puffelen, Apr 23 '21 at 01:02
I changed to "allGroups". Forgot to add a different name so they are two different nodes. Does that make sense? The allGroups node contains just the keys for each group. The "groups" node contains the messages and their info for each group. I've read that this should be done for the fan-out technique for chats but not sure why. — Elizabeth, Apr 23 '21 at 01:10
As said, fanning out data can take many forms, so it might help if you link to where you've read about it. There are definitely many use-cases where having a list of keys is really helpful, such as having a list of what the user has access to (in case you can't secure it with queries), such as shown here: https://stackoverflow.com/questions/33540479/best-way-to-manage-chat-channels-in-firebase. But there are also plenty of cases where this is not needed — Frank van Puffelen, Apr 23 '21 at 01:16
You don't want to use Firebase Realtime Database for paging data. Cloud Firestore is built for that. — El Tomato, Apr 23 '21 at 02:17
How is Firestore better for this scenario in terms of efficiency? Trying to understand what's the best option here. — Elizabeth, Apr 23 '21 at 04:26
@ElTomato Firestore and Realtime Database use inherently the same underlying cursor-based model for pagination. Of the reasons I'd use for picking one or the other, this is not one of them. ¯\\_(ツ)_/¯ — Frank van Puffelen, Apr 23 '21 at 21:08
Question: why do you want to implement a fan-out? That's a pretty broad topic. Suggestion: a different approach to your structure; it appears you want to know if any messages are added, and then you want to know if any emoji's change within each message. I would suggest removing this `emojis: ["heart", "like"]` from the existing structure and storing those in a separate node. `emoji's` which looks like this `-MEt4K5xhsYL33anhXpP: emojis: ["heart", "like"]` then add *one* observer to that node so you're notified of emoji changes per message instead of possibly thousands. — Jay, Apr 24 '21 at 13:24
*Should I be including references to profile images under each message node* - why wouldn't you? It's a reference which is a tiny amount of data. In reference to @ElTomato comment; it's correct and a very good point that pagination is a bit easier in Firestore due to the addition of `cursors` but the RTDB is totally capable of that (we've been doing pagination in the RTDB for years) so it should not be a component of your database choice - there are other, more important aspects (cost, query capabilities etc) — Jay, Apr 24 '21 at 13:30
@Jay For emojis aren't I checking right now if there's an emoji change only per message? If the array changes value then it's notified for that specific ref.child("emojis"). Where does the thousands come from? And I was thinking if there were say 100,000 messages in my app if it's better to read each user's profile reference from a user node when loading messages or just keep the reference to the profile pic under the messages. I thought that would be a lot of storage but may be not. — Elizabeth, Apr 24 '21 at 21:02
@FrankvanPuffelen ElTomato Last question. Since I'm querying the last children for each load, this has a worse read performance vs if I were querying the top ones because firebase still needs to "look" at the other top nodes I don't query. Is read performance going to be significantly affected if each group message node has a couple thousand of children? — Elizabeth, Apr 24 '21 at 22:01
^I think I'll just make the date child negative and keep index on date to be able to limitToFirst — Elizabeth, Apr 24 '21 at 22:13
*Where does the thousands come from* - if there are thousands of posts, there are then thousands of observers for the emoji's within those posts. If the emoji node is separate, then the same task takes one observer and loads less data; so it's just a design choice. Storing the ref to the image in the users node requires an additional read because the post is read and then the users node to get the pic from storage. On the other hand, keeping the ref in the post reduces the reads but if the pic changes, you'll have to update it in every post. I would vote for the prior. — Jay, Apr 25 '21 at 13:08
`ref.queryLimited(toLast: 40)` is a little unclear until you wrap your brain around it. This is actually saying 'for whatever the query is, limit to the last 40'; it's not querying everything to get the last 40. So, if the 40 are 'at the top' or the 40 are 'at the end', the performance is "the same"; it's not 'looking over' the other top nodes. Read performance will not be significantly impacted and... Firebase is blisteringly fast so it's actually hard to craft something that impacts the read performance. @FrankvanPuffelen may want to correct/update me on that but it's what our testing shows. — Jay, Apr 25 '21 at 13:16
@Jay Thanks! I need to change that. Didn't think about having to update profile ref too. — Elizabeth, Apr 26 '21 at 07:14

score 0 · Answer 1 · answered Apr 27 '21 at 16:23

There's a lot of comments to the question so I thought I would condense all of that into an answer.

The intention of the 'fan out technique' in the question was to maximize query performance.

In this use case the query only returns the last 40 results

ref.queryLimited(toLast: 40)

The assumption in the question was that Firebase had to 'go through' all of the nodes before those 40 to get to the 40, therefore affecting performance. That's not the case with Firebase so whether it be the first 40 or the last 40, the performance is 'the same'.

Because of that, no 'fan-out' is really needed in this situation. For clarity

Fan-out is the process duplicating data in the database. When data is duplicated it eliminates slow joins and increases read performance.

I am going to steal a fan out example from an old Firebase Blog. Here's a fan out to update multiple nodes at once, and since it's an atomic operation it either all passes or all fails.

let updatedUser = ["name": "Shannon", "username": "shannonrules"]
let ref = Firebase(url: "https://<YOUR-FIREBASE-APP>.firebaseio.com")

let fanoutObject = ["/users/1": updatedUser, 
                    "/usersWhoAreCool/1": updatedUser, 
                    "/usersToGiveFreeStuffTo/1", updatedUser]

ref.updateChildValues(updatedUser) // atomic updating goodness

I will also include a link to Introducing multi-location updates and more as well as suggesting a read on the topic of denormalization.

In the question, there isn't really any data to 'fan out' so it would not be applicable as there isn't an attempt to join (pull data from multiple nodes) or to update multiple nodes.

The one change I would suggest would be to remove the emoji's node from the message node.

As is, every one of those has an observer which results in thousands of observers which can be difficult to manage. I would create a separate high-level node just for those emojis

emojis
   -MEt4K5xhsYL33anhXpP: //the message id
      "heart": 2  //or however you want to store them
      "like": 1

Then add a single observer (much easier to manage!) to the emoji node. When an emoji changes, that one observer will notify the app of which message it was for, and what the change was. It will also cut down on reads and overall cost.

Swift Firebase "Fan Out" Technique vs queryLimited Efficiency

1 Answers1