0

Mongo objectid, and MD5 hash function, which one is more likely to collide, now I am building a website, and look for a way to index my products.

Thanks in advance.

user2002692
  • 971
  • 2
  • 17
  • 34
  • Assuming they're both "reasonable", then the answer is: whichever is the smaller hash type. – Oliver Charlesworth May 24 '14 at 18:31
  • possible duplicate of [Possibility of duplicate Mongo ObjectId's being generated in two different collections?](http://stackoverflow.com/questions/4677237/possibility-of-duplicate-mongo-objectids-being-generated-in-two-different-colle) – Neil Lunn May 25 '14 at 05:13

2 Answers2

2

MongoDB's ObjectId is unlikely to collide. It contains a counter, a random number, process id, etc. MD5 hashes depend on the value of the input. If you pass two inputs that have the same value, then hashes will be the same.

I should know more about how you hash your products. If you are sure your product values won't be the same, then you can use both. But I would use ObjectId, because you won't need to worry about product values and hashing at all. The size of ObjectId is also smaller than the size of an MD5 hash, which is better for indexing.

Gergo Erdosi
  • 40,904
  • 21
  • 118
  • 94
  • But I am not using Mongodb, I am using mysql, so is it worth finding a way to generate a mongo-like objectid? – user2002692 May 24 '14 at 19:03
  • 1
    You could use `UUID()` in case of MySQL: http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_uuid – Gergo Erdosi May 24 '14 at 19:07
  • uuid function in python seems that it creates an equal length string (without "-") as MD5 hexdigest (both 32 chars). Is this something I should worry about? – user2002692 May 25 '14 at 02:11
  • 1
    Not really. UUIDs are widely used in MySQL to identify resources. It requires more space than MongoDB's ObjectId, but it's not that long that it would cause a performance issue. – Gergo Erdosi May 25 '14 at 02:15
0

If your product model has some relation to real world products, then there are existing ways to index them - pick article numbers or EAN(IAN). They can be an addition to natural auto-incremental ids of mysql. UUIDs have pros (mainly for distributed data bases) and cons - read https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439