We're generating hashes to provide identifiers for documents being stored in RavenDB. We're doing this as there is a limit on the length of the DocumentID (127 characters - ESent limitation) if you want to use BulkInsert like so:
_documentStore.BulkInsert(options: new BulkInsertOptions { CheckForUpdates = true }))
In order for the BulkInsert to work, the DocumentID needs to match the row being upserted; so we want an DocumentID that can be regenerated from the same source string consistently.
An MD5 hash will provide us a fixed length value with a low probability of collision, with the code used to generate the hash below:
public static string GetMD5Hash(string inputString)
{
HashAlgorithm algorithm = MD5.Create();
var hashBytes = algorithm.ComputeHash(Encoding.UTF8.GetBytes(inputString));
return Encoding.UTF8.GetString(hashBytes);
}
However; RavenDB does not support "\" in DocumentID; so I want to replace it with "/". However my fear is that in doing so we are increasing the likelihood of a hashing conflict.
Code I want to change to:
public static string GetMD5Hash(string inputString)
{
HashAlgorithm algorithm = MD5.Create();
var hashBytes = algorithm.ComputeHash(Encoding.UTF8.GetBytes(inputString));
return Encoding.UTF8.GetString(hashBytes).Replace('\\', '"');
}
Will this increase the likelihood of hash conflicts and remove our ability to depend on the DocumentID as "unique"?