Firebase: Making sure an action is performed only once using multiple workers

Question

I am matching the items from two lists, eg here the element c of A and c of B would match. I then do some processing and add the matched pair in another list.

 - List A
  - a
  - b
  - c

 - List B
  - c
  - d

To do this I watch for addition on both list A and B, and check if a match exist when something is added.

This works well, but I have too many inserts for a single client.

So I need to run my matcher on multiple machines to speed things up.

But I want each match to happen only on one machine, ie if machine 1 finds a match there's no point for machine 2 to process it as well.

I tried using atomic commits but while this prevents multiple matches to mess with each other, the matching is still done twice.

How could I "lock" elements to make sure other machines don't consider them once the matching process started?

score 1 · Accepted Answer · answered Jan 03 '17 at 17:26

Firebase does not provide native support for something like this, and additionally I would be concerned by the lack of idempotency in the streaming protocol itself. If you subscribe to updates on a topic, but the node itself dies, on your next server start you will get a VALUE update, not a collection of the all the INCREMENTAL updates that occurred while your node was down.

With good data structures you could "roll your own" facilities like this. After all, cluster-aware task processors with task idempotency and worker locking like Resque and Celery do exactly that with not much more in the way of base resources (Redis, a DB, etc.) You will need to add data sets to manage worker locking, job ID locking by workers, recovery/error handling facilities, etc. However, if you review the code they use to do this you will quickly see that it takes more work than a simple StackOverflow post will manage to achieve this.

As an alternative, why not consider using a stack such as ActionHeroJS as a cluster-aware API layer? It has Redis-backed cluster mechanics and Resque-based task management with all of your requirements covered and it pairs really well with Firebase...

score 0 · Answer 2 · answered Jan 03 '17 at 19:10

I'll take a shot at this.

Our app has users and some of the user data can be edited - however, we don't want the user data to be edited on multiple clients at the same time. So we implemented a simple locking mechanism that notified the clients that the user is locked when it's being edited.

Applying this to your use case

- List A
  -Yiuiaisida9  //node names created with childByAutoId
     letter: "a"
     isLocked: false
  -YJI99s9ajsl
     letter: "b"
     isLocked: false
  -YE9jsiakskk
     letter: "c"
     isLocked: true

 - List B
  -YJ0a0s0kdka
     letter: "c"
     isLocked: true
  -YM0s09s0ksk
     letter: "d"
     isLocked: false

So your clients all observe List A and List B. When a child node is added to List A, that node is initially set locked: true. Search List B and if it finds a match, lock it as well and begin processing.

The other clients will be notified that 'c' is now locked and will simply ignore them as their code does not proceed to process locked nodes.

Just a thought...

Firebase: Making sure an action is performed only once using multiple workers

2 Answers2