0

I have an ANSI string Ď–ór˙rXüď\ő‡íQl7 and I need to convert it to hexadecimal like this: 06cf96f30a7258fcef5cf587ed51156c37 (converted with XVI32).

The problem is that Python cannot encode all characters correctly (some of them are incorrectly displayed even here, on Stack Overflow) so I have to deal with them with a byte string.

So the above string is in bytes this: b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'

And that's what I need to convert to hexadecimal.

So far I tried binascii with no success, I've tried this:

h = ""
for i in b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7':
    h += hex(i)
print(h)

It prints:

0x60xcf0x960xf30xa0x720x830xff0x720x580xfc0xef0x5c0xf50x870xed0x510x150x6c0x37

Okay. It looks like I'm getting somewhere... but what's up with the 0x thing?

When I remove 0x from the string like this:

h.replace("0x", "")

I get 6cf96f3a7283ff7258fcef5cf587ed51156c37 which looks like it's correct.

But sometimes the byte string has a 0 next to a x and it gets removed from the string resulting in a incorrect hexadecimal string. (the string above is missing the 0 at the beginning).

Any ideas?

Ciprum
  • 734
  • 1
  • 11
  • 18

2 Answers2

4

If you're running python 3.5+, bytes type has an new bytes.hex() method that returns string representation.

>>> h = b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
>>> h.hex()
'06cf96f30a7283ff7258fcef5cf587ed51156c37'

Otherwise you can use binascii.hexlify() to do the same thing

>>> import binascii
>>> binascii.hexlify(h).decode('utf8')
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
Kristaps Taube
  • 2,363
  • 1
  • 17
  • 17
1

As per the documentation, hex() converts “an integer number to a lowercase hexadecimal string prefixed with ‘0x’.” So when using hex() you always get a 0x prefix. You will always have to remove that if you want to concatenate multiple hex representations.

But sometimes the byte string has a 0 next to a x and it gets removed from the string resulting in a incorrect hexadecimal string. (the string above is missing the 0 at the beginning).

That does not make any sense. x is not a valid hexadecimal character, so in your solution it can only be generated by the hex() call. And that, as said above, will always create a 0x. So the sequence 0x can never appear in a different way in your resulting string, so replacing 0x by nothing should work just fine.

The actual problem in your solution is that hex() does not enforce a two-digit result, as simply shown by this example:

>>> hex(10)
'0xa'
>>> hex(2)
'0x2'

So in your case, since the string starts with b\x06 which represents the number 6, hex(6) only returns 0x6, so you only get a single digit here which is the real cause of your problem.


What you can do is use format strings to perform the conversion to hexadecimal. That way you can both leave out the prefix and enforce a length of two digits. You can then use str.join to combine it all into a single hexadecimal string:

>>> value = b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
>>> ''.join(['{:02x}'.format(x) for x in value])
'06cf96f30a7283ff7258fcef5cf587ed51156c37'

This solution does not only work with a bytes string but with really anything that can be formatted as a hexadecimal string (e.g. an integer list):

>>> value = [1, 2, 3, 4]
>>> ''.join(['{:02x}'.format(x) for x in value])
'01020304'
poke
  • 369,085
  • 72
  • 557
  • 602