4

I read in a stackoverflow post that (link here)

By using predictable (e.g. sequential) IDs for documents, you increase the chance you'll hit hotspots in the backend infrastructure. This decreases the scalability of the write operations.

I would like if anyone could explain better on the limitations that can occur when using sequential or user provided id.

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
pariola
  • 923
  • 12
  • 27
  • 1
    Did you read the answer there and also click through to the discussion on Google's google-cloud-firestore-discuss mailing list? I don't think it's going to get any more detailed than that. – Doug Stevenson Dec 22 '18 at 19:15
  • ohh unintentionally skipped that!, thanks. – pariola Dec 22 '18 at 19:45

1 Answers1

7

Cloud Firestore scales horizontally by allocated key ranges to machines. As load increases beyond a certain threshold on a single machine, it will split the range being served by it and assign it to 2 machines.

Let's say you just starting writing to Cloud Firestore, which means a single server is currently handling the entire range.

When you are writing new documents with random Ids, when we split the range into 2, each machine will end up with roughly the same load. As load increases, we continue to split into more machines, with each one getting roughly the same load. This scales well.

When you are writing new documents with sequential Ids, if you exceed the write rate a single machine can handle, the system will try to split the range into 2. Unfortunately, one half will get no load, and the other half the full load! This doesn't scale well as you can never get more than a single machine to handle your write load.

In the case where a single machine is running more load than it can optimally handle, we call this "hot spotting". Sequential Ids mean we cannot scale to handle more load. Incidentally, this same concept applies to index entries too, which is why we warn sequential index values such as timestamps of now as well.

So, how much is too much load? We generally say 500 writes/second is what a single machine will handle, although this will naturally vary depending on a lot of factors, such as how big a document you are writing, number of transactions, etc.

With this in mind, you can see that smaller more consistent workloads aren't a problem, but if you want something that scales based on traffic, sequential document ids or index values will naturally limit you to what a single machine in the database can keep up with.

Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
  • This should really be on the docs. I just spent quite some time refactoring my implementation to use sequential IDs (`App instance ID + "_" + local SQLite rowId`) instead of generated ones, only to stumble upon this question now. – Actine May 22 '19 at 21:09
  • Sorry to hear that @Actine. We mention it in best practices: https://cloud.google.com/firestore/docs/best-practices - had you stumbled upon that in the docs and it wasn't clear or did you not find that section at all? – Dan McGrath May 22 '19 at 21:24
  • — I never examined the cloud.google.com docs, only Firebase docs. – Actine May 23 '19 at 22:17
  • @Actine - also there: https://firebase.google.com/docs/firestore/best-practices. Dropping a note to our tech writing team as it is a bit hidden in a collapsible menu. – Dan McGrath May 23 '19 at 23:20
  • 1
    Oh yeah, sorry, somehow missed that. Anyway, it wasn't a problem to revert. I don't think I'll ever run into those scaling problems though. But this also makes me wonder what could performance implications be if I'm structuring my data into per-user subcollections as /users//tasks/, where each user can basically only access their own tasks. Will Firestore split data between users first, or split each user's subcollection when it comes to that? – Actine May 25 '19 at 12:21