1

I have written a method to convert a plain text into it's hashcode using MD5 algorithm. Please find the code below which I used.

public static String convertToMD5Hash(final String plainText){
            MessageDigest messageDigest = null;

            try {
                messageDigest = MessageDigest.getInstance("MD5");
            } catch (NoSuchAlgorithmException e) {
                LOGGER.warn("For some wierd reason the MD5 algorithm was not found.", e);
            }

            messageDigest.reset();
            messageDigest.update(plainText.getBytes());
            final byte[] digest = messageDigest.digest();
            final BigInteger bigInt = new BigInteger(1, digest);
            String hashtext = bigInt.toString(8);

            return hashtext;
}

This method works perfectly but it returns a lengthy hash. I need to limit this hash text to 8 characters. Is there any possibilities to set the length of the hashcodes in Java?

нαƒєєz
  • 1,239
  • 4
  • 17
  • 27
  • It's a hash, simply use 8 characters of the generated String (whichever you want. Avery choice should be as good as any other) – MrSmith42 Nov 11 '13 at 12:00
  • I need something like shrinking/compressing the hashcode rather than taking a part from it. :) – нαƒєєz Nov 11 '13 at 12:27
  • @zulox, MD5 is a cryptographic hash and there is no point in specifically shrinking or compressing it, you can just take the first 8 hex nibbles or take a prefix in some other way. You don't win anything e.g. by XORing all the 8-nibble blocks together, it doesn't add to security or randomness. – Antti Huima Nov 11 '13 at 12:53
  • 3
    `messageDigest.update(plainText.getBytes());` this will not work if your platform's default encoding changes. Use `getBytes(StandardCharsets.UTF_8)` – artbristol Nov 11 '13 at 13:04
  • 8 hex encoded characters? – Marcus Adams Nov 11 '13 at 13:24
  • 8 characters at four bits per character gives 32 bits. I guess an average laptop could generate more than match for a given value an hour on a shortish message. (Collisions are very much easier than that.) So this isn't going to be very secure. – Tom Hawtin - tackline Nov 11 '13 at 15:04

4 Answers4

7

Yes and No. You can use a substring of the original hash if you always cut the original hash-string similary (ie. 8 last/first characters). What are you going to do with that "semi-hash" is another thing.

Whatever it is you're going to do, be sure it has nothing to do with security.

Here's why: MD5 is 128-bit hash, so there's 2^128 = ~340,000,000,000,000,000,000,000,000,000,000,000,000 possible permutations. The quite astronomical amount of permutations is the thing that makes bruteforcing this kind of string virtually impossible. By cutting down to 8 characters, you'll end up with 32-bit hash. This is because a single hex-value takes 4 bits to represent (thus, also 128-bit / 4 bit = 32 hex-values). With 32-bit hash there's only 2^32 = 4,294,967,296 combinations. That's about 79,228,162,514,264,337,593,543,950,336 times less secure than original 128-bit hash and can be broken in matter of seconds with any old computer that has processing power of an 80's calculator.

Simo Erkinheimo
  • 1,347
  • 9
  • 17
  • For clarification, "broken" here means finding a collision. – Marcus Adams Nov 11 '13 at 13:29
  • Where are your -1s coming from? – Tom Hawtin - tackline Nov 11 '13 at 15:08
  • @TomHawtin-tackline Good point. I guess I mixed that one with the min/max -value calculations of integers. – Simo Erkinheimo Nov 11 '13 at 22:25
  • Suppose if I get the first 8 chars from the hashcode, could I be assured that I will not get the same first 8 char value for different plain texts?What I meant was, will the first 8 characters of a hashcode be unique? – нαƒєєz Nov 12 '13 at 05:55
  • @zulox No. Hashes may have collision ie. different hashed strings may have the same hash. However possibility of a collision is greatly diminished if 128-bit hash is used ("astronomically minimal possibility"). In your 8-character = 32-bit case collisions might easily become likely event (depending on the amount of hashes of of course) – Simo Erkinheimo Nov 12 '13 at 09:41
1

No. MD5 is defined to return 128 bit values. You could use Base64 to encode them to ASCII and truncate it using String#substring(0, 8).

In Java 8 (not officially released yet), you can encode a byte[] to Base64 as follows:

String base64 = Base64.getEncoder().encodeToString(digest);

For earlier Java versions see Decode Base64 data in Java

Community
  • 1
  • 1
isnot2bad
  • 24,105
  • 2
  • 29
  • 50
0

all hash algorithms should randomly change bits in whole hash whenever any part of data has changed. so you can just choose 8 chars from your hash. just don't pick them randomly - it must be reproducible

piotrek
  • 13,982
  • 13
  • 79
  • 165
0

Firstly as everyone has mentioned, the 64 bit hash is not secure enough. Ultimately it depends on what you exactly plan to do with the hash.

If you still need to convert this to 8 characters, I suggest downcasting the BigInteger to a Long value using BigIteger.longValue()

It will ensure that the long value it produces is consistent with the hash that was produced.

I am not sure if taking most significant 64 bits from the 128 bit hash is good idea. I would rather take least significant 64 bits. What this ensures is that

when hash(128, a) = hash(128, b) then hash(64, a) = hash(64, b) will always be true.

But we have to live with collision in case of 64 bits i.e. when hash(64, a) = hash(64, b) then hash(128, a) = hash(128, b) is not always true.

In a nutshell, we ensure that we do not have a case where 128 bit hashes of 2 texts are different, but their 64 bit hashes are same. It depends on what you really use the hash for, but I personally feel this approach is more correct.

Kalpak Gadre
  • 6,285
  • 2
  • 24
  • 30