I have a fairly large set of data coming from an external source (via excel or csv). It has no unique key associated with each record. There is uniqueness of each row based on a set of 3-4 of the columns of data. I'm parsing this data and inserting it into a database.
What would be the best way to generate a hash code or some key based on these unique columns? I need it to be unique based on these columns because I need to compare it to another set of data from yet another source.
I could just concat the data and use that as the key but I'd prefer a smaller generated hash code (sha1, md5, whatever) to use as the key in the database when I'm loading the data.
I'm leaning towards using the Apache Commons DigestUtils and passing a String of the concatenated columns to generate a SHA1 code but I'm wondering if that's overkill.
Any suggestions? I'm not looking for super crypto secure - just something that will be unique to compare against.