2

I have an array of string unique ids from a Firebase database, one of which looks like this: QXgZI3JB72Zf1qzeawIdxHSsPa62

I want to print these ids to the user, but first I'd like to first shorten the ids to around 7 characters long minimum, but have it go over that length in case two shortened ids end up becoming equal. That way, if somebody wants to refer to a specific id, they can use the shortened version without accidentally referring to multiple things.

Is there any way to accomplish this?


To get a better idea of what I'm talking about, refer to the git command git rev-parse (docs), which has the --short flag that cuts a commit hash to at least 7 characters long (but will allow it to be longer in case it's no longer unique):

Instead of outputting the full SHA-1 values of object names try to abbreviate them to a shorter unique name.

octopod
  • 824
  • 2
  • 10
  • 23
  • Yes, there is a way to accomplish this (actually, many different ones). What have you tried? Where are you stuck? – Bergi Sep 12 '16 at 15:24
  • I was thinking of slicing each string of the array, checking for equality on all of them, then slicing the equal ones again. Before I do that though, I was just wondering if there was a name for what I'm trying to do and if there's a library that already does this a lot better than what I just described. (of course, unless the answer is so simple that it's preferable to just do it myself.) – octopod Sep 12 '16 at 15:35
  • Do you want to ensure unicity forever (across multiple invocations), or just on a single page? Is there any reason you actually need to list IDs for the user, rather than have them click on a link or something similar? – jcaron Sep 12 '16 at 15:46
  • @octopod I guess your search terms should include "unique prefix". – Bergi Sep 12 '16 at 15:53
  • 1
    @jcaron: preferably, and yes, because it's actually a node.js process (apologies for mis-tagging) and the only way of referring to a _specific_ object in an Firebase array to my knowledge is knowing the entire identifier (unless I read it from the database into an array and refer to the objects by their index, but what if I change the order of it later?). Although, you do have a point once I do use them for a website, I should probably use links instead. – octopod Sep 12 '16 at 15:54
  • @octopod Your approach seems to be fine. Use a `Set` or `Map` for efficient lookup of the equal items. There are even more efficient approaches, but they tend to get complex; you shouldn't look into them until performance actually becomes an issue. – Bergi Sep 12 '16 at 15:56
  • Sure it's "possible": just pretend you're a [Hash Table](https://en.wikipedia.org/wiki/Hash_table). Compute a shortened "hash" output of the input (using whatever means necessary, including truncation) - note that this is *not* a bijective function - and keep track of the shortened output. Keep doing this for all the inputs and, if there is an input for which there is a collision, apply a secondary function repeatedly to make the output unique.. Also note that the *ordering and data of the input* will determine the final results. – user2864740 Sep 12 '16 at 16:36

2 Answers2

0

I think there is no safe method to shorthen unique ids after they are produced (CMIIAW).

The ids unqueness is assured by a central server, which guarantees for their uniqueness: in the case of a conflict, it will recalculate a different id, until it's unique.

If you shorten ids after they are emitted, you can't be 100% guaranteed they will be unique, if you do not want to replay server's job, keeping a database of ids...

As you said git --short prints a shortened version of commit ids, but will allow it to be longer in case it's no longer unique ...

MarcoS
  • 17,323
  • 24
  • 96
  • 174
  • In this case, safety's not an issue. These identifiers are simply referring to indexes of an array because the `.push()` function in Firebase doesn't assign it an array index, it assigns it a unique identifier (with after some work, doesn't really matter because `.forEach()` ensures array order anyway). The information inside isn't sensitive at all; anyone can see it. – octopod Sep 12 '16 at 15:40
  • I used 'safe' to mean '100% sure'... I did not imply anything about security... :-( Sorry, I'm not English native speaker... – MarcoS Sep 12 '16 at 15:45
0

The thing that is missing from the question is whether you are willing to retain a lookup table, or if you are seeking a two-way encoding method where the long string is algorithmically encoded into a shorter string which can then be decoded into the original string.

If the latter is the case, then good luck and please post your answer here if/when you find it!

Otherwise, just make a hash of key/value pairs:

var dbKeyHash = { "xyz":"QXgZI3JB72Zf1qzeawIdxHSsPa62",
   ...
}

The key could be an incrementing number, or any other short but unique string. Going through your list of database keys, you could generate a short string at random, and if it's not already in the hash, use it, otherwise, generate another and continue.

Then, when a user references one of the short keys you display, you just look up the value in the hash:

var dbKey = dbKeyHash['xyz'] // QXgZI3JB72Zf1qzeawIdxHSsPa62

You might find something good on this thread as far as generating short strings: How to code a URL shortener?

Community
  • 1
  • 1
Cliff Hall
  • 51
  • 4