I have several billion strings in the format word0.word1.word2, and I wish to perform modulo n on those strings so that I can feed each to a database writer for storage. I know I can perform a form a modulo 10 on the first character of the strings like this:
for i in ["a.b","c.d"]:
print ord(i[0]) % 10
This won't divide my strings evenly, though, as word0, word1, and word2 are sorted into alphabetical order, and the first character of the string is very often "a". I could take the last letter of the string, but am not sure if those are normally distributed or not.
My question: Is there a fast way to perform something like "ord" on the entire string? I ultimately plan to run modulo 48 on the integer representations of the strings, and wish for that modular output to be uniformly distributed across all 48 cores. I would be grateful for any help others can offer.