I have to generate the hash of an XML string in Java and afterwards store this information in a database table field (my DBMS is Postgres). Which is the best hash function to use? Thank you in advance
-
http://docs.oracle.com/database/121/JAXML/toc.htm – SMA Oct 25 '14 at 08:24
-
http://stackoverflow.com/questions/2624192/good-hash-function-for-strings checkout this question – rpozarickij Oct 25 '14 at 08:25
-
Thank you for the links, both very useful. – OuterSpace Oct 25 '14 at 08:49
-
@almasshaikh: how are Oracle's XML function relevant to a question regarding Postgres? – Oct 25 '14 at 15:22
-
While I see the case for closing and my first instinct was to do that, I'm voting to leave open because I think this is valuable Q&A material for programmers, and *good* answers wouldn't be subjective, they'd be based on specific technical considerations. – Adi Inbar Oct 25 '14 at 16:48
2 Answers
It rather depends on the purpose of the hash function. If your aim is to do fast equality matching between documents, then it depends on your criteria for considering two documents to be equal. For example, do you want them to be equal if they have different whitespace, or if they have the same attributes but in a different order? If that's part of the requirement, the best approach might be to first canonicalize the XML documents, then to apply a general-purpose string hash function to the canonicalized form.

- 156,231
- 11
- 92
- 164
-
In my case if the 2 xml documents have same attributes but in different order they have to be considered different.The documents passed contain the user selections and are validated against an XML Schema. What do you mean exactly by canonicalizing the documents? And which kind of general-purpose function can I apply? thank you – OuterSpace Oct 26 '14 at 19:06
-
There's a W3C definition of "canonical XML" which is one possible definition of whether two XML documents should be considered equivalent: see http://www.w3.org/standards/techs/xmlc14n#w3c_all. And there are libraries that will convert XML to canonical XML for you, e.g. https://www.aleksey.com/xmlsec/c14n.html. But if your definition of equality is a home-grown one, then these aren't going to help you. – Michael Kay Oct 27 '14 at 14:33
-
ok, thank you for the precious suggestions. As a first step I will go through canonization then I will add more detailed checks. – OuterSpace Oct 27 '14 at 19:44
The current best (most secure) general purpose hash functions to use are SHA256 or SHA512. Unless you want ultra-high security, SHA256 will do just fine.
For password hashing, the current standard is bcrypt.
There are lots of broken hash functions around, so don't just pick one out of the air...

- 20,430
- 4
- 39
- 67
-
-
@christopher yes, I was really commenting on best general purpose hash function. OP didn't explicitly mention passwords, for which bcrypt is better, but I've added that in. Thanks. – chiastic-security Oct 25 '14 at 09:00
-
Understood. Slightly misleading when you said *"the current best (most secure)"*. – christopher Oct 25 '14 at 09:22