2

I have a large one million document collection with Firebase that I treat as a stack array where the first element gets read and removed from the stack. My main problem is I have over a thousand connections trying to access the collection and I am having issues with connections receiving the same document. To prevent duplicates results, I've resorted to using Mutex as referenced by this post below..

Cloud Firestore document locking

I am using a Mutex to lock each document before removing it from the collection. I use transactions to ensure the mutex owner is not getting overwritten by other connections or to check if the document has not been removed yet.

The problem I have with this solution is as we scale up, more connections are fighting over retrieving a mutex lock. Each connection spends a long time retrying until it successfully locks a document. Avoiding long retries will allow for faster response time and less reads.

So in summary, a connection tries to retrieve a document. It retrieves the document but fails to successfully create a lock because another incoming connection just locked it. So it looks for another document and also fails. It keeps retrying until it beats another connnection to locking the document.

Is it possible to increase throughput and keep read costs low as I scale up?

C O
  • 326
  • 1
  • 4
  • 11

1 Answers1

3

Yeah, I doubt such a mutex is going to help your throughput.

How important is it that documents are processed in the exact order that they are in the queue? If it is not crucial, you could consider having each client request the first N documents, and then picking one at random to lock-and-process. That would improve your throughput up to N times.

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
  • I originally sorted the document by creation time but I am willing to sacrafice that for better throughput. That was actually one of the first solutions I pondered on but I'm worried the read costs may get expensive overtime as I continue to scale up. – C O Nov 26 '19 at 04:40
  • I tested this method and it's working wonderfully. So I edited my question alittle for the sake of cost savings. Right now I have each request pulling 100 documents. I can have on average 10,000 tasks running to pull a document. 100,000 document reads is $0.06 cents, so I think it'll be only $.60 cents after all 10,000 tasks are finished executing which isn't so bad actually. – C O Nov 26 '19 at 07:24