5

What is a fast hash function available for the iPhone to hash web urls (images)?

I'd like to store the cached web image as a file with a hash as the filename, because I suppose the raw web url could contain strange characters that could cause problems on the file system.

The hash function doesn't need to be cryptographic, but it definitely needs to be fast.

Example:

Input: http://www.calumetphoto.com/files/iccprofiles/icc-test-image.jpg

Output: 3573ed9c4d3a5b093355b2d8a1468509

This was done by using MD5(), but since I don't know much about that topic I don't know if it is overkill (-> slow).

znq
  • 44,613
  • 41
  • 116
  • 144
  • Well since you'll be writing the file to a file system, pretty much any hashing algorithm will be more than fast enough. Personally, I'd just go for replacing the / characters with percent escapes. – JeremyP May 29 '10 at 18:47
  • 1
    JeremyP: That may make the pathname too long for a filename. – Peter Hosey May 30 '10 at 06:57
  • @JeremyP: well, in the most cases I will be loading from the file system. But you're right. That's still way more expensive than the hashing. – znq May 30 '10 at 09:50
  • 1
    @Peter: +1 that's a good point. Just googled HFS+ and the limit is only 255 characters. – JeremyP May 30 '10 at 23:42

2 Answers2

12

MD5 may be broken for security purposes, but it works well for the situation you describe. Here's a thread on how to implement it on iPhone. Check out Vroomtrap's post. For posterity, here's my own version of that code:

- (NSString *)MD5Hash {
    const char *cStr = [self UTF8String];
    unsigned char result[CC_MD5_DIGEST_LENGTH];

    CC_MD5( cStr, strlen(cStr), result );

    return [NSString stringWithFormat: @"%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X",
        result[0], result[1], result[2], result[3], result[4], result[5], result[6], result[7],
        result[8], result[9], result[10], result[11], result[12], result[13], result[14], result[15] ];
}

You'll need to import the CommonCrypto/CommonDigest.h header.

warrenm
  • 31,094
  • 6
  • 92
  • 116
  • 1
    I found this one here very helpful: http://www.saobart.com/md5-has-in-objective-c/ – znq May 29 '10 at 16:33
  • I'd recommend using `dataUsingEncoding:` instead of `UTF8String`. `strlen` is not cheap, since it has to walk the entire string to find the end of it to know how long it is. The NSData object knows how long the data is. – Peter Hosey May 30 '10 at 06:52
  • You're welsome to do so. My empirical testing showed that using dataUsingEncoding performed the same as the above method on strings of moderate length (200K) and substantially worse on large strings (2M). – warrenm May 31 '10 at 03:36
0

I think the NSObject already has a hash function. And NSUrl or NSString can override it, can you try with those things. I think in most of cases, it is fast enough, like we put NSString into NSDictionary:) NSObject hash

vodkhang
  • 18,639
  • 11
  • 76
  • 110
  • As a return value I get an Integer value, which I can convert to a string and use it as a filename. However, is that "strong" enough to differentiate between the many different urls out there? How likely is it that two different urls result in the same hash? – znq May 29 '10 at 14:32
  • I can not find any documents on google for that :(. But I think if you have a small number of urls (10 - 100, I just guessed), it can be ok. But I found out that usually people use MD5 to generate the hash, so performance may not be a big problem with MD5 – vodkhang May 29 '10 at 14:59
  • Thanks. I actually checked the time to execute both, and MD5 is pretty much the same as [myObject hash] – znq May 29 '10 at 16:32
  • You should generally not use this for storing files as collisions are somewhat common with NSObject's -hash function. Apple doesn't make any guarantees about collisions with this method, and I introduced a severe crash using it for image cache hashing once. – Tim Johnsen Feb 23 '17 at 16:45
  • @TimJohnsen What did you end up using as a hash function to generate filenames? – Evan R Mar 19 '21 at 05:30
  • I generally use MD5 – Tim Johnsen Mar 20 '21 at 15:25