1

Meteor uses it's internal Random package to generate Mongo-Ids for the documents, where the used set of characters is defined as:

var UNMISTAKABLE_CHARS = "23456789ABCDEFGHJKLMNPQRSTWXYZabcdefghijkmnopqrstuvwxyz";

The method description for Random.id also states:

Return a unique identifier, such as "Jjwjg6gouWLXhMGKW", that is likely to be unique in the whole world.

which is defined for the default length of an Id (17 chars; each one of UNMISTAKABLE_CHARS).

Now I would like to use only the first N characters of the Id to shorten my URLs (which include the Ids to dynamically load pages that require a specific document, which is determined by the Id).

So if my original Id is

`v5sw59HEdX9KM5KQE`

I would like to use for example (consider a totally random-picked N=5 here):

{
  _id:"v5sw59HEdX9KM5KQE",
  short: "v5sw5"
}

as document schema and fetch the respective document by this Id using { short } as query in my Mongo.Collection.

Now my question is how many characters are satisfactory to prevent collision if an amount of documents (thus Ids) between 5000 to 10000 are to be considered.

Note: I have some tools on entropy calculation and all these values (character set, length of the original Ids, number of documents) in front of me but I don't know how to wire this all up to safely calculate N.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Jankapunkt
  • 8,128
  • 4
  • 30
  • 59

3 Answers3

3

If I understand correctly, besides the normal 17 chars long id generated for your documents _id, you would like a shorter id so that typically url's look less scary when they contain that id.

In your example you truncate the id, hence creating an explicit association between your shorter id and the original document id.

This sounds like git shorten commit hash: How does Git(Hub) handle possible collisions from short SHAs?

You could follow a similar path, i.e. first determine an initial default length that is reasonable to avoid probable collision (as explained in Peter O.'s answer), but explicitly check for uniqueness server-side and increase the length of any new shorten version in case of collision, until it becomes unique again.

ghybs
  • 47,565
  • 6
  • 74
  • 99
2

Generating identifiers at random already runs the risk, at least in theory, of generating a duplicate identifier. For the default length of MongoIDs (assuming there are 5517 of them), the chance of having a duplicate MongoID reaches 50% after generating almost 731156 billion random MongoIDs (see "Birthday problem"), so the chance of a duplicate is negligible in practice for most applications.

Shortening a random identifier will make the collision problem even worse. In this case, for an ID length of 5 characters (resulting in 555 or 503284375 different IDs), the chance of having a duplicate MongoID reaches 50% after generating only about 26415 random IDs.

Since it appears that you can't control how MongoIDs are generated as easily as you can control how shortened "unique IDs" are generated, one thing you can do is the following:

  • Create a document that assigns each MongoID to a uniquely assigned number (such as a monotonically increasing counter).
  • To make the numbers assigned this way "look random", feed each number to a so-called "full-period" linear congruential generator to get a unique but "randomized" number within the generator's period.
  • The numbers (encoded similarly to MongoID strings) can then serve as short identifiers for your purposes.

But consider whether you really want the short identifiers created this way to be predictable. Using short identifiers hardly achieves this predictability goal.

If you wish to go the route of using shortened MongoIDs, see "Birthday problem" for formulas you can use to estimate how many random numbers it takes for the risk of collision to remain tolerable.


For further information on how Meteor generates MongoIDs, see also this question; one of its answers includes a way you can have MongoDB generate MongoIDs on the server rather than have Meteor do so on the client. It appears, too, that Meteor doesn't check the MongoIDs it generates for uniqueness before inserting them into a document.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
2

I would argue that if you want to avoid collisions on a small collection then you don't want to use random ids, but either go with fully deterministic IDs or at least reduce the randomness to something more controlled. Along those lines, another option for you to consider would be to use MONGO for idGeneration in your collection. Those IDs are generated following a known recipe. Accordingly you could take characters 1-4 and 12 of that ID and would get a guarantee for no hash collisions as long as no more than N documents are created in the same second, where N is the number of characters used in MongoIDs (which I don't know off hand).

Christian Fritz
  • 20,641
  • 3
  • 42
  • 71