1

How can I use struct to convert an arbitrary int (unsigned, signed, long long,... just int in Python) to a byte sequence (bytes)?

If I read this correctly, I would need to decide on a format string depending on the sign or length of the int, but in Python I don't really have this distinction. I'm trying to convert an arbitrary int to bytes and re-create the int again from the sequence of bytes (bytes).

Here are some attempts which failed:

# int is too big
>>> struct.unpack('>i', struct.pack('>i', -12343243543543534))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
>>> -12343243543543534
-12343243543543534
>>> struct.unpack('>q', struct.pack('>q', -12343243543543534))
(-12343243543543534,)

# again, integer value is too big, but can be represented as integer (below)
>>> struct.unpack('>q', struct.pack('>q', -1234324354354353432432424))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: int too large to convert
>>> -1234324354354353432432424
-1234324354354353432432424

Alternatively, I could also use the largest "container" to convert to bytes and "up cast" when turning the bytes back into an integer, but then I would know which format string is safe (=largest) to use.

bytes approach

bytes(int) seems to have the same problem and requires to know about the sign:

>>> bytes(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: negative count
>>> i
-123432432432232

int.to_bytes / int.from_bytes

With a sufficiently large number of bytes, I can store "any" integer value, but it is still required to know about the sign.

>>> int(-1234324354354353432432424).to_bytes(64, 'little')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: can't convert negative int to unsigned
>>> int(-1234324354354353432432424).to_bytes(64, 'little', signed=True)
b'\xd9\xf0;\xd5l5\x86$\x9f\xfa\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
>>> int.from_bytes(int(-1234324354354353432432424).to_bytes(64, 'little', signed=True))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: from_bytes() missing required argument 'byteorder' (pos 2)
# no sign parameter
>>> int.from_bytes(int(-1234324354354353432432424 + 1).to_bytes(64, 'little', signed=True), 'little')
13407807929942597099574024998205846127479365820592393377723561443721764030073546976801874298166903427690031858186486050853753882810712245592079295573651673
# actual number
>>> int.from_bytes(int(-1234324354354353432432424 + 1).to_bytes(64, 'little', signed=True), 'little', signed=True)
-1234324354354353432432423
orange
  • 7,755
  • 14
  • 75
  • 139
  • Just to be clear: You want to store an integer? Ideally in a binary format? A bytearray is something different. – MisterMiyagi Aug 20 '20 at 07:51
  • Does this answer your question? [Converting int to bytes in Python 3](https://stackoverflow.com/questions/21017698/converting-int-to-bytes-in-python-3) – MisterMiyagi Aug 20 '20 at 07:53
  • @MisterMiyagi "list of byte" to be precise (I didn't mean `bytearray`)... – orange Aug 20 '20 at 07:55
  • A list of byte is for example ``[b'\xff', b'\xd4', b'%', b'\xe2', b'\xa9', b'6', b'\x01', b'\x12']``, is that really what you want? – MisterMiyagi Aug 20 '20 at 08:00
  • Either that or `b'\xff\xd4%\xe2\xa96\x01\x12'` (``). Either way is fine. I want to write it to a binary file and read it again. – orange Aug 20 '20 at 08:01
  • @MisterMiyagi: no the link doesn't answer the question. It's the same problem. You need to know the sign. – orange Aug 20 '20 at 08:10
  • Did you try using [`int.to_bytes`](https://docs.python.org/3/library/stdtypes.html#int.to_bytes) (as suggested in [this answer](https://stackoverflow.com/a/30375198) to the linked question)? – mkrieger1 Aug 20 '20 at 08:12
  • Yes, same problem (you'd need to know the sign). Let me update my post... – orange Aug 20 '20 at 08:14
  • 1
    What do you mean by "you need to know the sign"? What is the problem with that? You can specify whether the number is signed or unsigned in `to_bytes`. – mkrieger1 Aug 20 '20 at 08:14
  • 1
    Is the last example you have shown not exactly what you want?! – mkrieger1 Aug 20 '20 at 08:18
  • You can't just write this `int` into a file and read it from that file without also storing the sign (you need to specify a `signed` parameter in order to reassemble this integer correctly). I was hoping to "dump" the memory content of this integer into a file (as `bytes`) and read it again without worrying about size of sign. – orange Aug 20 '20 at 08:18
  • 2
    You don't need to know *which* sign the number has. You need to know *that* it has a sign. There is no way around that. – mkrieger1 Aug 20 '20 at 08:19
  • Bummer... I was hoping to get around that. (+/- sign... which... or when you assume only - is a sign, then yes "that") – orange Aug 20 '20 at 08:23
  • 2
    Take note that using ``int.to_bytes`` with ``signed=True`` is the closest you can get to dumping the memory content. Said memory content includes an explicit sign. – MisterMiyagi Aug 20 '20 at 09:35
  • Noted @MisterMiyagi - thanks. – orange Aug 21 '20 at 00:25

1 Answers1

1

bytes(int) seems to have the same problem and requires to know about the sign

No; this is not for the kind of conversion you have in mind at all. Observe:

>>> bytes(3) # 3 is the *length* of the result
b'\x00\x00\x00'

With a sufficiently large number of bytes, I can store "any" integer value, but it is still required to know about the sign.

This is not a limitation you could possibly avoid even in theory. The stored byte data is just data; whether it represents a signed or unsigned value is an interpretation of that data that must be imposed upon it. There is no way in principle that you could take in b'\x80' and just know whether it should represent 128 or -128, just by looking at it; and it does not matter whether that value came from using the struct module, int.to_bytes or anything else. No tool you use can make the decision for you, because you are not giving it any information with which the decision could be made.

(Note that your use of the struct module does encode assumptions about signedness. For example, q denotes a signed 64-bit type; for the corresponding unsigned type, you would use Q.)

In short, what you are asking for does not make sense.

You can, of course, work around the limitation by either a) adopting a convention (you know whether to interpret the value as signed or unsigned because of the context in which you are interpreting the bytes), or b) explicitly adding that information (storing a byte that encodes the signedness of the value - but this hardly ever is done in the real world). That is to say: you can give your decoding mechanism the information to make the decision, or you can make the decision yourself.


You can't just write this int into a file and read it from that file without also storing the sign (you need to specify a signed parameter in order to reassemble this integer correctly). I was hoping to "dump" the memory content of this integer into a file (as bytes) and read it again without worrying about size of sign.

You are, presumably, storing multiple values in the same file. That means that the foregoing is not a real limitation anyway - because you already have to make decisions about the size of the values you read from the file; where one ends and the next begins. Which is to say, in your example:

int.from_bytes(int(-1234324354354353432432424 + 1).to_bytes(64, 'little', signed=True), 'little', signed=True)

you tell yourself that you don't need to know the size for the .from_bytes call, but in a real application you still would - because you would be taking a slice of the file data, not the whole thing, and you would need to know the bounds of the slice.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • Thanks. "No tool you use can make the decision for you" - I was hoping that since the origin is the Python type `int` (which doesn't differentiate between unsigned, signed, size) that the conversion to `bytes` would contain the sign and perhaps a fixed size, but that doesn't seem to be the case (instead, the standard binary encoding is used). – orange Aug 20 '20 at 10:20
  • I settled for `numpy.save(f, np.asarray([value]))` and `numpy.load(f)[0]`. I realise that `struct` is a bit too low level for my needs. – orange Aug 21 '20 at 00:24
  • The converted value contains a sign, but it doesn't contain a *signedness* - rather: a positive integer can be represented fundamentally in two different ways. There is no fixed size because the integer type in Python is arbitrary precision. If you have a value of `255` then you will need two bytes to represent it as signed but only one to represent it as unsigned. – Karl Knechtel Aug 21 '20 at 14:46
  • that’s all understood. I wasn’t so much after a basic binary encoding compliant bit export (which struct seems to be doing), but an efficient (no parsing, etc.) bit stream serialisation. I was wrongly assuming that `struct` was the way to go. Thanks for your explanation and the comments, I realise that it isn’t. – orange Aug 21 '20 at 15:22
  • I mean, it is *if you can dictate the types* and go ahead and use them blindly when parsing. That's how real-world binary formats work, after all. The *documentation* tells you whether a value is signed or unsigned and how many bytes are used. – Karl Knechtel Aug 21 '20 at 15:31