1

I need help in a scenario when we do multipath updates to a fan-out data. When we calculate the number of paths and then update, in between that, if a new path is added somewhere, the data would be inconsistent in the newly added path.

For example below is the data of blog posts. The posts can be tagged by multiple terms like “tag1”, “tag2”. In order to find how many posts are tagged with a specific tag I can fanout the posts data to the tags path path as well:

/posts/postid1:{“Title”:”Title 1”,  “body”: “About Firebase”, “tags”: {“tag1:true, “tag2”: true}}
/tags/tag1/postid1: {“Title”:”Title 1”,  “body”: “About Firebase”}
/tags/tag2/postid1: {“Title”:”Title 1”,  “body”: “About Firebase”}

Now consider concurrently,

1a) that User1 wants to modify title of postid1 and he builds following multi-path update:

/posts/postid1/Title : “Title 1 modified”
/tags/tag1/postid1/Title : “Title 1 modified”
/tags/tag2/postid1/Title : “Title 1 modified”

1b) At the same time User2 wants to add tag3 to the postid1 and build following multi-path update:

/posts/postid1/tags : {“tag1:true, “tag2”: true, “tag3”: true}
/tags/tag3/postid1: {“Title”:”Title 1”,  “body”: “About Firebase”}

So apparently both updates can succeed one after other and we can have tags/tag3/postid1 data out of sync as it has old title.

I can think of security rules to handle this but then not sure if this is correct or will work.

Like we can have updatedAt and lastUpdatedAt fields and we have check if we are updating our own version of post that we read:

posts":{
  "$postid":{
    ".write":true,
    ".read":true,
    ".validate": "
      newData.hasChildren(['userId', 'updatedAt', 'lastUpdated', 'Title']) && (
        !data.exists() || 
        data.child('updatedAt').val() === newData.child('lastUpdated').val())"
  } 
}

Also for tags we do not want to check that again and we can check if /tags/$tag/$postid/updatedAt is same as /posts/$postid/updatedAt.

"tags":{
  "$tag":{
    "$postid":{
      ".write":true,
      ".read":true,
      ".validate": "
        newData.hasChildren(['userId', 'updatedAt', 'lastUpdated', 'Title']) && (
        newData.child('updatedAt').val() === root.child('posts').child('$postid').val().child('updatedAt').val())”            
    }
  }     
}

By this “/posts/$postid” has concurrency control in it and users can write their own reads Also /posts/$postid” becomes source of truth and rest other fan-out paths check if updatedAt fields matches with it the primary source of truth path.

Will this bring in consistency or there are still problems? Or can bring performance down when done at scale?

Are multi path updates and rules atomic together by that I mean a rule or both rules are evaluated separately in isolation for multi path updates like 1a and 1b above?

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
Abhijit-K
  • 3,569
  • 1
  • 23
  • 31
  • One of the consequences of duplicating the title is that you'll have to either ensure its referential integrity in your security rules, run the updates from a single place (a server) or accept that they can get out of sync (and consider which value is the "master" so that you can implement a clean-up process that guarantees eventual consistency). There is no single correct answer, but here are the options I listed a while ago: http://stackoverflow.com/questions/30693785/how-to-write-denormalized-data-in-firebase/30699277#30699277 – Frank van Puffelen Feb 14 '17 at 15:54
  • Thanks. What I thought that is after the 2 rules that i mentioned; even if both multipath updates are running concurrently on server (and not serially) because the rules overlap and conflict in conditions they both will fail or at least one will fail (1b). What you think? How is atomicity handled for rules especially when paths from different locations are used in rules? There seems to be no document on this. – Abhijit-K Feb 14 '17 at 16:49
  • Sorry, it's hard to parse complex rules (which are unfortunately necessary for scenarios like this, unless you use a single writer for all the dependent data). This bit `newData.child('updatedAt').val() === root.child('posts').child('$postid').val().child('updatedAt').val())”` looks suspicious, because you refer to `root` - which has the old data. If you want the root-as-it-would-exist-after-this-write, traverse to it through `newData.parent().parent()....`. – Frank van Puffelen Feb 14 '17 at 18:08
  • so what that means is `newData.parent().parent()` can potentially (by chance) point to latest version of source doc at `/posts/$postid` (if changed by other concurrent update) whereas `root.child('posts').child('$postid)` will always point to relatively older version of $postid before the multipath update started. However still there is no guarantee. Right? – Abhijit-K Feb 14 '17 at 18:19
  • This also bothers me about single path updates without transaction. With the first rule in place that has validation `data.child('updatedAt').val() === newData.child('lastUpdated').val()`, if 2 concurrent updates hits the `/posts/$postid` will one fail? In code when I am updating post I am setting `post["lastUpdated"] = post["updatedAt”]` and `post["updatedAt"] = firebase.database.ServerValue.TIMESTAMP`. This is to achieve optimistic concurrency control. – Abhijit-K Feb 14 '17 at 18:40
  • By definition `data` and `root` are the data as it exists before the update, `newData` is the data as it exists after the update (if the update is allowed). – Frank van Puffelen Feb 14 '17 at 19:14
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/135727/discussion-between-abhijit-kadam-and-frank-van-puffelen). – Abhijit-K Feb 15 '17 at 02:48

1 Answers1

1

Unfortunately, Firebase does not provide any guarantees, or mechanisms, to provide the level of determinism you're looking for. I have had the best luck front-ending such updates with an API stack (GCF and Lambda are both very easy, server-less methods of doing this). The updates can be made in that layer, and even serialized if absolutely necessary. But there isn't a safe way to do this in Firebase itself.

There are numerous "hack" options you could apply. You could, for example, have a simple lock mechanism using a dedicated collection for tracking write locks. Clients could post to a lock collection, then verify that their key was the only member of that collection, before performing a write. But I hope you'll agree with me that such cooperative systems have too many potential edge cases, potential security issues, and so on. In Firebase, it is best to design such that this component is not a requirement in the first place.

Chad Robinson
  • 4,575
  • 22
  • 27
  • Yes, I am NOT looking for timeout based multi-phase commit cooperative system. I am fine if both the multipath updates fail. Or eventual consistency, define a source path and other places should sync from that eventually. I will be using server for some tasks but then API server will be scalable (many instances) and worker servers will read from Firebase queue so chances are 2 workers working on same set of paths. – Abhijit-K Feb 14 '17 at 18:31
  • Sorry, Firebase does not provide any internal mechanics either to provide sync-from-source or coordinated updates. It DOES support transactions, but they wouldn't be very useful for what you're describing. I stand by my statement that coordinating this externally from Firebase itself is the best answer. – Chad Robinson Feb 14 '17 at 18:46