56

How can I convert a (big endian) variable-sized binary byte array to an (unsigned) integer/long? As an example, '\x11\x34', which represents 4404

Right now, I'm using

def bytes_to_int(bytes):
  return int(bytes.encode('hex'), 16)

Which is small and somewhat readable, but probably not very efficient. Is there a better (more obvious) way?

dda
  • 6,030
  • 2
  • 25
  • 34
loopbackbee
  • 21,962
  • 10
  • 62
  • 97
  • What makes you think it's not very efficient? More to the point, what makes you think this will be a bottleneck in any code you'll ever write? – abarnert Aug 12 '14 at 08:51
  • Meanwhile, is this a fixed-length bytearray, or is it always 2 bytes? – abarnert Aug 12 '14 at 08:52
  • 1
    Also, what is `'\x1134'`? You mean `'\x11\x34'`? Or `'\\x1134'`? Because what you've written is a 3-character byte string with bytes 0x11, 0x33, 0x34, which I don't think is what you have or want. – abarnert Aug 12 '14 at 08:54
  • 1
    Have you tried using struct ? https://docs.python.org/2/library/struct.html – ovi Aug 12 '14 at 08:57
  • I've added those details to the question. It's a variable-size bytearray, and my example was wrong. The code is probably not a bottleneck in any way (otherwise this should probably be done in C), but I wanted to know if there was a "canonical best way to do it" - especially since there should *preferably be only one obvious way to do it* – loopbackbee Aug 12 '14 at 09:09
  • @goncalopp: OK, that's reasonable. I've rewritten my answer to be as much about pythonicness as performance. If I can think of a clean way to write it in NumPy I'll add that too. – abarnert Aug 12 '14 at 09:17
  • Related, but not a dup, because it's about fixed-length byte strings: [convert a string of bytes into an int (python)](http://stackoverflow.com/questions/444591/convert-a-string-of-bytes-into-an-int-python) – abarnert Aug 12 '14 at 09:31
  • Also, on the off chance that you're using Jython or IronPython, one of the Related links on the right about Java or C# may be the best answer for you. – abarnert Aug 12 '14 at 09:32

2 Answers2

102

Python doesn't traditionally have much use for "numbers in big-endian C layout" that are too big for C. (If you're dealing with 2-byte, 4-byte, or 8-byte numbers, then struct.unpack is the answer.)

But enough people got sick of there not being one obvious way to do this that Python 3.2 added a method int.from_bytes that does exactly what you want:

int.from_bytes(b, byteorder='big', signed=False)

Unfortunately, if you're using an older version of Python, you don't have this. So, what options do you have? (Besides the obvious one: update to 3.2, or, better, 3.4…)


First, there's your code. I think binascii.hexlify is a better way to spell it than .encode('hex'), because "encode" has always seemed a little weird for a method on byte strings (as opposed to Unicode strings), and it's in fact been banished in Python 3. But otherwise, it seems pretty readable and obvious to me. And it should be pretty fast—yes, it has to create an intermediate string, but it's doing all the looping and arithmetic in C (at least in CPython), which is generally an order of magnitude or two faster than in Python. Unless your bytearray is so big that allocating the string will itself be costly, I wouldn't worry about performance here.

Alternatively, you could do it in a loop. But that's going to be more verbose and, at least in CPython, a lot slower.

You could try to eliminate the explicit loop for an implicit one, but the obvious function to do that is reduce, which is considered un-Pythonic by part of the community—and of course it's going to require calling a function for each byte.

You could unroll the loop or reduce by breaking it into chunks of 8 bytes and looping over struct.unpack_from, or by just doing a big struct.unpack('Q'*len(b)//8 + 'B' * len(b)%8) and looping over that, but that makes it a lot less readable and probably not that much faster.

You could use NumPy… but if you're going bigger than either 64 or maybe 128 bits, it's going to end up converting everything to Python objects anyway.

So, I think your answer is the best option.


Here are some timings comparing it to the most obvious manual conversion:

import binascii
import functools
import numpy as np

def hexint(b):
    return int(binascii.hexlify(b), 16)

def loop1(b):
    def f(x, y): return (x<<8)|y
    return functools.reduce(f, b, 0)

def loop2(b):
    x = 0
    for c in b:
        x <<= 8
        x |= c
    return x

def numpily(b):
    n = np.array(list(b))
    p = 1 << np.arange(len(b)-1, -1, -1, dtype=object)
    return np.sum(n * p)

In [226]: b = bytearray(range(256))

In [227]: %timeit hexint(b)
1000000 loops, best of 3: 1.8 µs per loop

In [228]: %timeit loop1(b)
10000 loops, best of 3: 57.7 µs per loop

In [229]: %timeit loop2(b)
10000 loops, best of 3: 46.4 µs per loop

In [283]: %timeit numpily(b)
10000 loops, best of 3: 88.5 µs per loop

For comparison in Python 3.4:

In [17]: %timeit hexint(b)
1000000 loops, best of 3: 1.69 µs per loop

In [17]: %timeit int.from_bytes(b, byteorder='big', signed=False)
1000000 loops, best of 3: 1.42 µs per loop

So, your method is still pretty fast…

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • python3 support is definitely a bonus. In this case it's efficient enough for my needs, I just wanted to make sure there wasn't a more obvious way – loopbackbee Aug 12 '14 at 09:14
3

Function struct.unpack(...) does what you need.

Curd
  • 12,169
  • 3
  • 35
  • 49