5

I have a MySQL database I am working on in PHP where It will perform address verification from a daily data feed. We would do address correction on our end, because we don't have control over the source of the feed.

I am trying to come up with a method to see if the address has been changed at the source. If it changes then an address verification would be performed in PHP on our MySQL database.

Without storing a copy of the old feed I was thinking it might be better to do a checksum of the fields from the feeds and store this with each record. Then each feed after that it would see if the checksum has changed. Is this the best method to do this? Might there been a PHP function to do all this already? What about something in MySQL? Thanks!

Edward
  • 9,430
  • 19
  • 48
  • 71
  • PHP and MySQL both have md5, sha1, etc... if you need to quickly check for changes. – Marc B Nov 14 '12 at 19:10
  • 2
    for address fields, calculating the md5 is probably more expensive than comparing the strings – ernie Nov 14 '12 at 19:12
  • To clarify, you'll still need to do a string comparison after calculating a checksum or a hash, so unless these fields are reasonably large, calculating something and comparing will be more expensive than comparing directly. – ernie Nov 14 '12 at 19:26
  • Do you have some reason for not just storing the address locally, and simply do a string comparison? – user229044 Nov 14 '12 at 19:28
  • @meagar There is our MySQL database which would be address corrected, and then there is the daily feed. To do a string compare of the fields to see if they changed would require storage of the previous daily feed, which is a duplicate database. The thing is, it doesn't matter if the street address, city, state or zip has changed just that if anything has changed it warrants for a new address verification to be run. This is why I thought about only storing the "checksum" of those fields. – Edward Nov 14 '12 at 21:05
  • @ernie It doesn't matter which hashing method is used, which ever is faster and supported. But in reality for production use it isn't a concern if it slows things down some because the number of records to process isn't huge. – Edward Nov 14 '12 at 21:12
  • @edward so, if you're not concerned about performance, it sounds like the only selling point for using the hash would be that you'd save a few bytes saving info in the db? The big disadvantage to doing this is that if the data feed goes away, you no longer have the address. Given all this, I'd say skip the hashing, and save and compare the strings directly. – ernie Nov 14 '12 at 21:40
  • @ernie I appreciate your thoughts. The daily feed won't go away in this project. – Edward Nov 14 '12 at 22:03

3 Answers3

7

crc32 is probably what you want.

In php: crc32() In Mysql CRC32()

crc32 is probably a better fit that SHA1 or MD5 for simple comparisons/data integrity: see here

Community
  • 1
  • 1
FilmJ
  • 2,011
  • 3
  • 19
  • 27
2

PHP and MySQL both support the crc32 function which is inexpensive to run; at least less so than a hash algorithm like MD or SHA.

drew010
  • 68,777
  • 11
  • 134
  • 162
  • I think you overestimate the expense of running a SHA1. – user229044 Nov 14 '12 at 19:25
  • 1
    Its not that expensive but is much more so than crc. Simple tests show crc runs at least 3-4 times as fast as SHA1 on the same string. – drew010 Nov 14 '12 at 20:44
  • 1
    My tests showed a negligible difference between the two over 100,000 iterations of hashing different length strings up to 100,000 characters in length. Regardless, the performance of either function doesn't matter at all, not even slightly. What matters is which does the job better. – user229044 Nov 14 '12 at 22:01
1

There are various hash methods you can use, either the md5 or sha ones will be ok, you will need to store in your database the hash string to compare to, Idealy you'd want to do something like

if (sha1(strtoupper($list_of_values) )=== $stored_hashstring){
    //skip
}else{
    //update
}

Depending of the data you might need to add additional parsing on the strings ie: removing spaces, etc

on_
  • 556
  • 5
  • 12