2

From a list of integers in C#, I need to generate a list of unique values. I thought in MD5 or similar but they generates too many bytes.

Integer size is 2 bytes.

I want to get a one way correspondence, for example

0 -> ARY812Q3
1 -> S6321Q66
2 -> 13TZ79K2

So, proving the hash, the user cannot know the integer or to interfere a sequence behind a list of hashes.

For now, I tried to use MD5(my number) and then I used the first 8 characters. However I found the first collision at 51389. Which other alternatives I could use?

As I say, I only need one way. It is not necessary to be able to calculate the integer from the hash. The system uses a dictionary to find them.

UPDATE:

Replying some suggestions about using GetHashCode(). GetHashCode returns the same integer. My purpose is to hide to the end user the integer. In this case, the integer is the primary key of a database. I do not want to give this information to users because they could deduce the number of records in the database or the increment of records by week.

Hashes are not unique, so maybe I need to use encryption like TripleDes or so, but I wanted to use something fast and simple. Also, TripleDes returns too many bytes too.

UPDATE 2: I was talking about hashes and it is an error. In reality, I am trying to obfuscate it, and I tried it using hash algorithm, that it is not a good idea because they are not unique.

Dabiel Kabuto
  • 2,762
  • 4
  • 29
  • 45
  • "Which other alternatives I could use?", use the same number for `HashCode` ??? – Habib Nov 27 '14 at 18:06
  • or int.GetHashCode()? – Uwe Hafner Nov 27 '14 at 18:06
  • 1
    What kind of problem do you want to solve? – Hamlet Hakobyan Nov 27 '14 at 18:06
  • @Uwe, for intergers the `GetHashCode()` returns the same integer. – Hamlet Hakobyan Nov 27 '14 at 18:08
  • A [hash](https://en.wikipedia.org/wiki/Hash_function) is not unique. Are you looking for an encryption? – HABO Nov 27 '14 at 18:12
  • What is it you are trying to accomplish here? If you use a known hash algorithm (e.g. MD5), your attacker can just generate the hash codes themselves and reverse the mapping. Plus, hash values are inherently not unique. – Peter Duniho Nov 27 '14 at 18:15
  • Hash-type encryption (which is what you want to do, not "get a hash"), doesn't make the hash unique. If you want two-way encryption (from which you can decode the integer), then use AES or something alike... but any form of two-way encryption where only the encrypted data is needed to decode is by definition not secure (much less if you use little amount of bytes). – Jcl Nov 27 '14 at 18:18
  • 1
    You could use a 32 bit blockcipher like Skip32 (simple, but an unusual crypto primitive). Or use format preserving encryption like AES in FFX mode (complicated). Or accept that the output is longer and use a 64 bit block cipher. – CodesInChaos Nov 27 '14 at 18:18
  • If by simply having a record number your users could modify the database in unwanted ways, then you have a more fundamental security problem. You could generate a table with a mapping from the integer PK to something else (e.g. GUID), and then use the something else for user scenarios, mapping back to the integer PK internally. – Peter Duniho Nov 27 '14 at 18:19
  • Where is your size limit coming from? It sounds like you expect to apply some algorithm repeatedly rather than computing a moderately cryptic value once and storing it in the row for reuse as needed. Why recompute the value? – HABO Nov 27 '14 at 18:29
  • 1
    Wat character encoding? Another option: just create a map holding 64 Ki worth of 8 random character strings, tested for uniqueness. – Maarten Bodewes Nov 27 '14 at 18:48
  • @Hamlet Hakobyan I didn't look :-) but then he does not have to think of his own implementation if the framework provides an implementation. – Uwe Hafner Nov 27 '14 at 18:51
  • A md5 hash generator in c# maybe? It might be overkill maybe..... – Bsearching Nov 27 '14 at 18:24

4 Answers4

6

Update May 2017

Feel free to use (or modify) the library I developed, installable via Nuget with:

Install-Package Kent.Cryptography.Obfuscation

This converts a non-negative id such as 127 to 8-character string, e.g. xVrAndNb, and back (with some available options to randomize the sequence each time it's generated).

Example Usage

var obfuscator = new Obfuscator();
string maskedID = obfuscator.Obfuscate(15);

Full documentation at: Github.


Old Answer

I came across this problem way back and I couldn't find what I want in StackOverflow. So I made this obfuscation class and just shared it on github.

Obfuscation.cs - Github

You can use it by:

Obfuscation obfuscation = new Obfuscation();
string maskedValue = obfuscation.Obfuscate(5);
int? value = obfuscation.DeObfuscate(maskedValue);

Perhaps it can be of help to future visitor :)

kent-id
  • 717
  • 10
  • 25
2

Encrypt it with Skip32, which produces a 32 bit output. I found this C# implementation but can't vouch for its correctness. Skip32 is a relatively uncommon crypto choice and probably hasn't been analyzed much. Still it should be sufficient for your obfuscation purposes.

The strong choice would be format preserving encryption using AES in FFX mode. But that's pretty complicated and probably overkill for your application.

When encoded with Base32 (case insensitive, alphanumeric) a 32 bit value corresponds to 7 characters. When encoded in hex, it corresponds to 8 characters.


There is also the non cryptographic alternative of generating a random value, storing it in the database and handling collisions.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
0

For what you want, I'd recommend using GUIDs (or other kind of unique identifier where the probability of collision is either minimal or none) and storing them in the database row, then just never show the ID to the user.

IMHO, it's kind of bad practice to ever show the primary key in the database to the user (much less to let users do any kind of operations on them).

If they need to have raw access to the database for some reason, then just don't use ints as primary keys, and make them guids (but then your requirement loses importance since they can just access the number of records)

Edit

Based on your requirements, if you don't care the algorithm is potentially computationally expensive, then you can just generate a random 8 byte string every time a new row is added, and keep generating random strings until you find one that is not already in the database.

This is far from optimal, and -can- be computationally expensive, but taking you use a 16-bit id and the maximum number of rows is 65536, I'd not care too much about it (the possibility of an 8 byte random string to be in a 65536 possibility list is minimal, so you'll probably be good at first or as much as second try, if your pseudo-random generator is good).

Jcl
  • 27,696
  • 5
  • 61
  • 92
  • The GUID size is too long for my purposes. I am looking for a algorithm to convert 2 bytes to a string maximum 8 chars. – Dabiel Kabuto Nov 27 '14 at 18:24
  • A GUID is typically 16 bytes, I'd not say that's "too big", but if those are your size requirements, then ok. Check some of the answers on this question to generate a smaller GUID if needed: http://stackoverflow.com/questions/5678177/how-to-generate-8-bytes-unique-id-from-guid – Jcl Nov 27 '14 at 18:26
  • Btw, no algorithm which can't be easily found will generate any *unique* id from a 32 byte int. You either pseudo-randomize it like GUIDs do (in which case I'd say 8 bytes is not enough to guarantee a low collision risk) or put up with something that can be easily reverse engineered if there's a need for it (and someone smart doing it) – Jcl Nov 27 '14 at 18:31
  • Updated answer with possible (yet not elegant or efficient) solution – Jcl Nov 27 '14 at 18:49
0

Xor the integer. Maybe with a random key that it is generated per user (stored in session). While it's not strictly a hash (as it is reversible), the advantages are that you don't need to store it anywhere, and the size will be the same.

CheloXL
  • 167
  • 2
  • 9
  • How does that produce a _unique_ value with a per-user random key? And how would you reverse it without knowing the user to get the key? – HABO Nov 27 '14 at 18:42
  • You need a unique value per user. You don't need a unique value for the application, you already have one: The ID you are trying to encrypt. Based on your requirements, the above accomplishes your two points: The user cannot know the integer nor interfere a sequence behind a list of hashes. You can use a unique key to encrypt, but that would weaken the encryption as two users accessing the same page will see the same ID. – CheloXL Nov 27 '14 at 18:46
  • 2
    That's so weak that it doesn't even exhibit the sequence hiding issues to OP wants to address. – CodesInChaos Nov 27 '14 at 18:50
  • You can even include the xor key in the encrypted Id and don't even have to store it in memory. I just tried to give an easy to implement answer. You can complicate it as far as you want. If you need cryptographic strength, use crypto functions and pay the price. – CheloXL Nov 27 '14 at 19:10
  • Sorry, I thought you were suggesting that Alphonse XOR their ID (1) with their per-user magic number (8) resulting in an obfuscated value of 9. Gwendolynn's ID (7) and magic number (14) would result in 9. The OP could use the obfuscated value (9) in, say, a URL and then deduce the correct user therefrom. – HABO Nov 27 '14 at 19:12