Trying to get a double-precision floating point score from a UTF-8 encoded string object in Python. The idea is to grab the first 8 bytes of the string and create a float
, so that the strings, ordered by their score, would be ordered lexicographically according to their first 8 bytes (or possibly their first 63 bits, after forcing them all to be positive to avoid sign errors).
For example:
get_score(u'aaaaaaa') < get_score(u'aaaaaaab') < get_score(u'zzzzzzzz')
I have tried to compute the score in an integer using bit-shift-left and XOR, but I am not sure of how to translate that into a float
value. I am also not sure if there is a better way to do this.
How should the score for a string be computed so the condition I specified before is met?
Edit: The string object is UTF-8 encoded (as per @Bakuriu's commment).