Hashs are a sensible topic and it is hard to recommend any such hash based upon your question. You might want to ask this question on https://security.stackexchange.com/ to get expert opinions on the usability of hashs in certain usecases.
What I understood so far is that most hashs are implemented incrementally in the very core; the execution-timing on the other hand is not that easy to predict.
I present you two Hasher
implementations which rely on "an existent free implementation in Java". Both implementations are constructed in a way that you can arbitrarily split your String
s before calling add()
and get the same result as long as you do not change the order of the characters in them:
import java.math.BigInteger;
import java.nio.charset.Charset;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;
/**
* Created for https://stackoverflow.com/q/26928529/1266906.
*/
public class Hashs {
public static class JavaHasher {
private int hashCode;
public JavaHasher() {
hashCode = 0;
}
public void add(String value) {
hashCode = 31 * hashCode + value.hashCode();
}
public int create() {
return hashCode;
}
}
public static class ShaHasher {
public static final Charset UTF_8 = Charset.forName("UTF-8");
private final MessageDigest messageDigest;
public ShaHasher() throws NoSuchAlgorithmException {
messageDigest = MessageDigest.getInstance("SHA-256");
}
public void add(String value) {
messageDigest.update(value.getBytes(UTF_8));
}
public byte[] create() {
return messageDigest.digest();
}
}
public static void main(String[] args) {
javaHash();
try {
shaHash();
} catch (NoSuchAlgorithmException e) {
e.printStackTrace(); // TODO: implement catch
}
}
private static void javaHash() {
JavaHasher h = new JavaHasher();
h.add("somestring");
h.add("another part");
h.add("eveno more");
int hash = h.create();
System.out.println(hash);
}
private static void shaHash() throws NoSuchAlgorithmException {
ShaHasher h = new ShaHasher();
h.add("somestring");
h.add("another part");
h.add("eveno more");
byte[] hash = h.create();
System.out.println(Arrays.toString(hash));
System.out.println(new BigInteger(1, hash));
}
}
Here obviously "SHA-256" could be replaced with other common hash-algorithms; Java ships quite a few of them.
Now you called out for a Long
as return-value which would imply you are looking for a 64bit-Hash. If this really was on purpose have a look at the answers to What is a good 64bit hash function in Java for textual strings?. The accepted answer is a slight variant of the JavaHasher
as String.hashCode()
does basically the same calculation, but with lower overflow-boundary:
public static class Java64Hasher {
private long hashCode;
public Java64Hasher() {
hashCode = 1125899906842597L;
}
public void add(CharSequence value) {
final int len = value.length();
for(int i = 0; i < len; i++) {
hashCode = 31*hashCode + value.charAt(i);
}
}
public long create() {
return hashCode;
}
}
Unto your points:
fast
With SHA-256 being slower than the other two I still would call all three presented approaches fast.
can be used incremental without compromising the other properties or keeping the strings in memory during the complete process.
I can not guarantee that property for the ShaHasher
as I understand it is block-based and I lack the source code.Still I would suggest that at most one block, the hash and some internal states are kept. The other two obviously only store the partial hash between calls to add()
Secure against collisions. If I compare two hash values from different strings 1 million times per day for the rest of my life, the risk that I get a collision should be neglectable.
For every hash there are collisions. Given a good distribution the bit-size of the hash is the main factor on how often a collision happens. The JavaHasher
is used in e.g. HashMap
s and seems to be "collision-free" enough to distribute similar keys far apart each other. As for any deeper analysis: do your own tests or ask your local security engineer - sorry.
I hope this gives a good starting point, details are probably mainly opinion-based.