Defining acceptable lexicographic similarity of Firestore document IDs

Question

I've seen in the Firebase Firestore documentation's 'Best Practices' that you should:

Avoid high read or write rates to lexicographically close documents, or your application will experience contention errors.

An example given of how not to write document IDs is:

Customer1, Customer2, Customer3, ...

I'm mapping data from an external service into a Firestore collection, and I want to keep their original ID names. They are prefixed with entry_, but then suffixed with a random / unique string as follows:

entry_{Unique_String}, entry_{Unique_String}, ... entry_{Unique_String}

Does each document ID being prefixed with entry_, but followed by a random string, categorise the documents together as being lexographically close and therefore predisposed to hotspotting?

Or, would it only be classed as such if they were indeed named:

entry_1, entry_2, entry_3, entry_4 ... <and so on>

I could of course strip / add entry_ to the IDs when reading / writing, but this would add more complexity to the server / client.*

*Edit to clarify as per Alex Mamo's comment:

Complexity would increase due to the following examples:

Introduction of strip / prepend "entry_" function wherever docs are being read / written in context of original dataset or need to be sent back to external service.
May require creation of document fields to track (e.g. type = "entry") where multiple categories of document ID are used in the same collection -- This may not be a disadvantage depending on use-case, e.g. if performing type comparisons.
Tedious to reimplement the above for other category types (e.g. foo_, bar_) that originate from the same external service, with the same prefixed unique strings.

Why do you say "would add more complexity to the server / client"? — Alex Mamo, Aug 15 '19 at 11:38
Good catch! That was a flimsy part of my question. Have updated at end. — Sarreph, Aug 15 '19 at 12:26

Alex Mamo · Accepted Answer · 2019-08-19T13:07:20.670

5

The scalability of this products comes from the fact that Firestore spreads the document out over its storage layer. In simplified manner, sequential ids have more hashing collisions, which means you can hit write limitations sooner. Having ids that are more random ensures the writes are spread out evenly across the storage layer. I advise you not to use 1, 2, 3, 4 as keys for your nodes or combinations of them. Using sequential ids for that, is an anti-pattern when it comes to Firestore, since it will cause for sure scalability problems. So I strongly recommend you using those random document ids.

For more informations, I recommend you read Dan McGrath's answer from the following post:

Limitations of using sequential IDs in Cloud Firestore

Edit:

Those random ids prefixed with a constant as you showed in one of your comments can behave as they are in a sequential manner.

Why do I say that?

The built-in generator for unique ids that is used in Firestore when you call CollectionReference's add() methods or CollectionReference's document() method without passing any parameters, generates random and highly unpredictable ids, which prevents hitting certain hotspots in the backend infrastructure. Simply using a prefix with some random 6 digit numbers may increase that change. So the collisions of ids in this case is most likely possible on a larger scale. Beside that, I recommend you check Frank van Puffelen's answer from this post, to see how are those unique documents ids generated. IMHO, you don't have to be concerned about those random document ids generated by that algorithm in any way.

edited Aug 19 '19 at 13:07

answered Aug 15 '19 at 14:07

Alex Mamo

130,605
17
163
193

1

Thank you for linking me to that other question, Alex, and for your answer. The google groups discussion in there was very interesting as well. However — perhaps I wasn't being clear enough — but in my original question I propose using non-sequential keys but that are prefixed with the string `entry_`. I am already aware that sequential numeric keys are bad, but I wanted to know if it's still bad to use random keys that are prefixed with a constant string... Please let me know if you need me to clarify further! – Sarreph Aug 16 '19 at 16:54
1

Yes, that's the exact same thing. So I still recommend you to use those random ids. – Alex Mamo Aug 17 '19 at 10:51
1

Sorry, you are saying that `entry_f48024` and `entry_0195ff` are the exact same thing as `entry_1` and `entry_2` as far as "sequential" ids are concerned? That was/is my question — appreciate your time on this! – Sarreph Aug 19 '19 at 09:49
1

Yes, that right, this is what I'm saying. You can use that naming but only for small data sets. If you need an app that should scale massively, then use those random document ids provided by Firestore. – Alex Mamo Aug 19 '19 at 09:57
1

Okay great, thank you! Would you mind amending your answer to acknowledge that random ids prefixed with a constant are classed as sequential, please? (Since that was really my original question). Then I can accept it :) – Sarreph Aug 19 '19 at 12:43
Let me try to explain once again, please see my updated answer. Is it ok now? – Alex Mamo Aug 19 '19 at 13:09
Great thanks for clearing that up! Thank you for your help, too – Sarreph Aug 19 '19 at 15:20

Defining acceptable lexicographic similarity of Firestore document IDs

1 Answers1

Linked