2

I was asked this before with slight different with current question. but did not got the answer I was looking into.

My question is do I need to store md5($url) in unique index in MySQL?? I have seen this in some code actually I don't remember..this is a large database with more than 5 million urls and the indexing is done by calling urls.

Any ideas?

BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
mathew
  • 1,179
  • 2
  • 13
  • 24
  • 5
    I actually can't figure out what you're asking. The reason someone would hash a pattern before searching for it is to make the search faster or to reduce their storage requirements. – Borealid Jul 14 '10 at 08:34
  • A hash will help only a trivial amount on index lookup speed; certainly it will help less than the cost of maintaining and storing the hashed index. One reason to store and search on a hashed item instead of the unhashed item would be to conceal the actual item value. When the item is hashed, the user would need to already know its value to use it to look up a record. – O. Jones Jul 14 '10 at 13:34

3 Answers3

2

I don't think you should hash your URLs. The only plausible reason would be to save space (if most of the URLs are larger than 32 chars) at the expense of increased risk of collisions.

What you should do is normalize the URLs.

Alix Axel
  • 151,645
  • 95
  • 393
  • 500
1

Some sites uses hashing for urls in the database because they use hashes in urls say for user redirect to external url. I can't see any reason to do this if this is not the case.

ivan73
  • 695
  • 1
  • 9
  • 16
0

are you saying that the url is called as such:

www.yourdomain.com?id=89ce9250e9f469c9d1816e1cc0fb47a1

and then the id (89ce9250e9f469c9d1816e1cc0fb47a1 which is an md5() of the real url querystring) is looked up from the database to resolve the actual url which could be:

www.yourdomain.com?user=23&location=5&eventtype=23&year=2010

Is this the kind of usage you're referring to??

jim

jim tollan
  • 22,305
  • 4
  • 49
  • 63
  • Well md5 is one-way so that would not really work with md5 but the idea is, IMO the same that mathew wants – DrColossos Jul 14 '10 at 08:44
  • Dr - yes, i'm aware of the md5 being one way. my thinking was that he'd have a unique column that stored the md5 of the url, which looked up the actual value from a secondary column.. does that make sense?? not sure of course why he'd want to do this but perhaps an update to the question will answer that :) – jim tollan Jul 14 '10 at 08:51