5

I want to convert an integer (int or long) a big-endian byte string. The byte string has to be of variable length, so that only the minimum number of bytes are used (the total length length of the preceding data is known, so the variable length can be inferred).

My current solution is

import bitstring

bitstring.BitString(hex=hex(456)).tobytes()

Which obviously depends on the endianness of the machine and gives false results, because 0 bits are append and no prepended.

Does any one know a way to do this without making any assumption about the length or endianess of an int?

Scott Griffiths
  • 21,438
  • 8
  • 55
  • 85
  • Does this only need to work for an `int`, or does it need to work for a `long` as well? – jchl Aug 23 '10 at 13:09
  • For `long` as well, I forgot about this. I will edit the question. –  Aug 23 '10 at 14:03
  • This can be done simply in any version of Python without external dependencies -- in any case, you want a BYTEstring, not a BITstring. – John Machin Aug 23 '10 at 22:51

4 Answers4

6

Something like this. Untested (until next edit). For Python 2.x. Assumes n > 0.

tmp = []
while n:
    n, d = divmod(n, 256)
    tmp.append(chr(d))
result = ''.join(tmp[::-1])

Edit: tested.

If you don't read manuals but like bitbashing, instead of the divmod caper, try this:

d = n & 0xFF; n >>= 8

Edit 2: If your numbers are relatively small, the following may be faster:

result = ''
while n:
    result = chr(n & 0xFF) + result
    n >>= 8

Edit 3: The second method doesn't assume that the int is already bigendian. Here's what happens in a notoriously littleendian environment:

Python 2.7 (r27:82525, Jul  4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> n = 65539
>>> result = ''
>>> while n:
...     result = chr(n & 0xFF) + result
...     n >>= 8
...
>>> result
'\x01\x00\x03'
>>> import sys; sys.byteorder
'little'
>>>
John Machin
  • 81,303
  • 11
  • 141
  • 189
  • This assumes that 1 byte equals 8 bits. I don't know if you can make this assumption with regard to the python semantics. The second method assumes that the integer is already in big-endian. –  Aug 23 '10 at 13:59
  • 1
    @ott: It's quite safe to say that 1 byte equals 8 bits, and Python integers themselves don't have endianness - it's only an issue in how they are stored or transmitted (i.e. it's only a problem if you've incorrectly unpacked `n` from somewhere before getting this far). Both methods look fine to me. – Scott Griffiths Aug 23 '10 at 15:52
  • Actually, it merely assumes that a byte is at *least* 8 bits, which is guaranteed by the C standard, and thus by the C PyBytes type. – dan04 Aug 23 '10 at 15:57
  • (1) Somebody please show me a machine that's got a non-8-bit byte and isn't in a museum (like Univac 110X (9-bit) or ICL 190X (6-bit)) and has a currently supported Python implementation (2) for any non-negative integer `x`, `x & 0xFF` and `x % 256` mean exactly the same thing in both C and Python irrespective of the endianness of the host machine. – John Machin Aug 23 '10 at 22:15
1

A solution using struct and itertools:

>>> import itertools, struct
>>> "".join(itertools.dropwhile(lambda c: not(ord(c)), struct.pack(">i", 456))) or chr(0)
'\x01\xc8'

We can drop itertools by using a simple string strip:

>>> struct.pack(">i", 456).lstrip(chr(0)) or chr(0)
'\x01\xc8'

Or even drop struct using a recursive function:

def to_bytes(n): 
    return ([chr(n & 255)] + to_bytes(n >> 8) if n > 0 else [])

"".join(reversed(to_bytes(456))) or chr(0)
tokland
  • 66,169
  • 13
  • 144
  • 170
  • The `struct.pack` method doesn't work, because `struct.unpack` requires a fixed length. For the other methods you would also need a reverse function (trivial). –  Aug 23 '10 at 13:54
0

I reformulated John Machins second answer in one line for use on my server:

def bytestring(n):
    return ''.join([chr((n>>(i*8))&0xFF) for i in range(n.bit_length()/8,-1,-1)])

I have found that the second method, using bit-shifting, was faster for both large and small numbers, and not just small numbers.

genixpro
  • 503
  • 1
  • 5
  • 8
  • I get an error using this with large integers. e.g. big = 2442323423424323434242335353 => TypeError: 'float' object cannot be interpreted as an integer – bjmc Aug 17 '17 at 11:27
0

If you're using Python 2.7 or later then you can use the bit_length method to round the length up to the next byte:

>>> i = 456
>>> bitstring.BitString(uint=i, length=(i.bit_length()+7)/8*8).bytes
'\x01\xc8'

otherwise you can just test for whole-byteness and pad with a zero nibble at the start if needed:

>>> s = bitstring.BitString(hex=hex(i))
>>> ('0x0' + s if s.len%8 else s).bytes
'\x01\xc8'
Scott Griffiths
  • 21,438
  • 8
  • 55
  • 85
  • `bit_length` seems to be a clean solution (though I'm on Python 2.6 on Debian). `(i.bit_length()+7)/8*8` rounds up the length to a length that is dividable by 8, am I right? The endianness problem also still exists. –  Aug 24 '10 at 18:47
  • I found an [explanation for the rounding](http://stackoverflow.com/questions/2403631/how-do-i-find-the-next-multiple-of-10-of-any-integer). So only the endianness problem remains. –  Aug 24 '10 at 19:50
  • `uint` is an alias for `uintbe`, so the endianess problem is also solved. –  Aug 24 '10 at 19:55
  • This was a bit more difficult than it needed to be, so I've added a feature request (http://code.google.com/p/python-bitstring/issues/detail?id=99) so hopefully in the next version you could say just `BitString(uintbe=456).bytes`. :) – Scott Griffiths Aug 24 '10 at 20:09