From my understanding, you want to hash lists of strings ensuring that no two different lists give the same result. This can be solved without thinking of the hash function at all.
You need a function String f(List<String> l)
where no two input values result in the same output (an injective function from List<String>
to String
). With this, you can give the output to your hash function, and be ensured that there will be no collisions as far as the hash function itself ensures (note MD5 was broken years ago, so it may not be an appropriate choice). Here are 2 ways to implement f
:
Transforming into a susbset of the character set
The most straightforward way is to just map every input to a subset of the character set of String that does not include your separator character:
public static String hex(String s) {
try {
String o = "";
for(byte b: s.getBytes("utf-8"))
o += String.format("%02x", b&0xff);
return o;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
public static String f(String... l) {
if (l.length == 0) return "";
String o = hex(l[0]);
if (l.length == 1) return o;
for (int i = 1; i < l.length; i++) o += "#" + hex(l[i]);
return o;
}
f("a#","b") => 6123#62
f("a","#b") => 61#2362
Length prefixing
This is also pretty simple, but has the disadvantage that it can't be rewritten to work in a stream.
public static String prefix(String s) {
return s.length() + "." + s;
}
public static String f(String... l) {
if (l.length == 0) return "";
String o = prefix(l[0]);
if (l.length == 1) return o;
for (int i = 1; i < l.length; i++) o += "#" + prefix(l[i]);
return o;
}
f("a#","b") => 2.a##1.b
f("a","#b") => 1.a#2.#b