2

I'm looking for the name of a certain kind of algorithm, which I think should have a name.

The algorithm is the one that figures out the shortest possible string in order to make it identifiably unique from the rest.

Like, in JS, a function given an array like this one:

[ '1c625b142483629db0a8063cfe5cd418e897154c',
  '28d9bf0ecac10311507b339e5d1324412d25cc3e',
  '4f3a202a34016cbdf1fc05c3efaaa06f72d3faa3',
  '2080d7f2b572196343695a7c60a6f3c6b747246c',
  '1903250de6c2a59e6c53dfa907188f2a7204ce76',
  'f8227a5a0e8eeea2fd7b47588d95d05755d0eb5b',
  '86aed9bd91eee88bb17382fe278a5fdc6f16d583' ]

Would return something like:

[ '1c',
  '28',
  '4',
  '20',
  '19',
  'f',
  '8' ]

Notice how all of the strings it returned are the first characters in the long hashes, just shortened in order to return only what is necessary to make them different from each other.

I'm going to be using this for matching a hash to a value. I'm making a todo app and I decided to use hashing in order to edit and/or remove values. So the user would refer to the todo item by it's hash, but I don't want to give the user a super long hash, only what the system needs to know which one the user is referring to.

So what would this kind of algorithm be called, if it does have a name?

Any help or clues are appreciated. :)

EDIT:

Seems like there is a discussion of how I'm going to use this. I just want to clear up that I am not going to use this to store stuff. The full hash is going to be used as the key for the todo task, the shortened hash (a.k.a. shortest unique prefix) is just for the UI. My question, which is what the name is, has been answered by @source.rar and @Paul, I'm now looking into the implementations. Will accept answer shortly...

EDIT 2:

All right. Being a JS newbie I spent a TON of time trying to figure it out by myself and couldn't, in the end my friend came along and gave me the following solution: https://gist.github.com/BruceCaldwell/70e53a456fd858bb03cc

He did, however, say it was not perfect and could probably do with some refactoring, but that's up to me to figure out. ;)

greduan
  • 4,770
  • 6
  • 45
  • 73
  • Obfuscation? Encryption? – Karl-André Gagnon May 30 '14 at 17:27
  • To me this is just some kind of hashing. – Sirko May 30 '14 at 17:27
  • You might want to fix the order on your expected output. Having them mismatched makes the question confusing. – David Ehrmann May 30 '14 at 17:30
  • This is nearly (but not quite) "shortest unique substring", that turns up in genome processing. Googling shows lots of hits for that - maybe something can be adapted to your needs? – The Archetypal Paul May 30 '14 at 17:30
  • " all of the strings it returned are the first characters in the long hashes," - ah! You mean you want the shortest prefix, not the shortest substring? Here's a link to how to do it - although since you're asking if it has a name, it's not a duplicate http://stackoverflow.com/questions/3612344/how-to-compute-shortest-unique-prefixes-of-a-set-of-strings – The Archetypal Paul May 30 '14 at 17:31
  • @Paul Indeed. Thank you! :) I will be Googleing for those terms while somebody comes up with a definitive answer. :) – greduan May 30 '14 at 17:33
  • 1
    What is the purpose of this? Is it an attempt to optimize performance? – cookie monster May 30 '14 at 17:33
  • @cookiemonster I clarified the question so as to answer your question. :) – greduan May 30 '14 at 17:36
  • Why couldn't the Array index be used as the unique identifier? Or is the Array going to be modified? And if so, how will you know that the shortened version is still unique? Just curious. – cookie monster May 30 '14 at 17:37
  • I'm going for "shortest unique prefix" as the name, based on a bit of googling. – The Archetypal Paul May 30 '14 at 17:37
  • @pasty, while a hash table might work for indexing the strings, it doesn't fit the definition - the OP wants a substring (actually a prefix), not just a unique hash code – The Archetypal Paul May 30 '14 at 17:44
  • @Paul in this case the `key` is the shortest possible non-repeatable string = the key for the dictionary and if the key is already taken (collision) the key is made longer ... so it looks like a naive hash with *non calculated*, but *generated(just taken from the real key)* key (and also the shorter key would need to point to the *real* key). – keenthinker May 30 '14 at 17:49

1 Answers1

4

googling for "shortest unique prefix" gave me this, How to compute shortest unique prefixes of a set of strings?, which led me to https://en.wikipedia.org/wiki/Patricia_trie, which seems to be what you are looking for?

source.rar
  • 8,002
  • 10
  • 50
  • 82
  • I don't understand how that's related. Isn't that just for compact storage of multiple strings? It doesn't tell you the shortest unique string, does it? – cookie monster May 30 '14 at 17:43
  • ...so what's your answer? What's the name of the algorithm? – cookie monster May 30 '14 at 18:06
  • The algorithm doesn't seem to have a name AFAIK (unless you count PATRICIA which is for the retrieval). If you find a name for it though feel free to share it here. – source.rar May 30 '14 at 18:28
  • I have no idea what the name is. I just wanted to be clear about what your answer was since it's getting upvotes. – cookie monster May 30 '14 at 18:35