Why does "bytes(n)" create a length n byte string instead of converting n to a binary representation?

Question

I was trying to build this bytes object in Python 3:

b'3\r\n'

so I tried the obvious (for me), and found a weird behaviour:

>>> bytes(3) + b'\r\n'
b'\x00\x00\x00\r\n'

Apparently:

>>> bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

I've been unable to see any pointers on why the bytes conversion works this way reading the documentation. However, I did find some surprise messages in this Python issue about adding format to bytes (see also Python 3 bytes formatting):

http://bugs.python.org/issue3982

This interacts even more poorly with oddities like bytes(int) returning zeroes now

and:

It would be much more convenient for me if bytes(int) returned the ASCIIfication of that int; but honestly, even an error would be better than this behavior. (If I wanted this behavior - which I never have - I'd rather it be a classmethod, invoked like "bytes.zeroes(n)".)

Can someone explain me where this behaviour comes from?

It is unclear from your question if you want the integer value 3, or the value of the ASCII character representing number three (integer value 51). The first is bytes([3]) == b'\x03'. The latter is bytes([ord('3')]) == b'3'. — florisla, Apr 05 '17 at 06:56

score 344 · Answer 1 · edited Oct 07 '22 at 10:44

344

From python 3.2 you can use to_bytes:

>>> (1024).to_bytes(2, byteorder='big')
b'\x04\x00'

def int_to_bytes(x: int) -> bytes:
    return x.to_bytes((x.bit_length() + 7) // 8, 'big')
    
def int_from_bytes(xbytes: bytes) -> int:
    return int.from_bytes(xbytes, 'big')

Accordingly, x == int_from_bytes(int_to_bytes(x)). Note that the above encoding works only for unsigned (non-negative) integers.

For signed integers, the bit length is a bit more tricky to calculate:

def int_to_bytes(number: int) -> bytes:
    return number.to_bytes(length=(8 + (number + (number < 0)).bit_length()) // 8, byteorder='big', signed=True)

def int_from_bytes(binary_data: bytes) -> Optional[int]:
    return int.from_bytes(binary_data, byteorder='big', signed=True)

edited Oct 07 '22 at 10:44

Neuron

5,141
5
38
59

answered May 21 '15 at 13:28

brunsgaard

5,066
2
16
15

4

While this answer is good, it works only for unsigned (non-negative) integers. I have adapted it write an [answer](https://stackoverflow.com/a/54141411/832230) which also works for signed integers. – Asclepius Jan 11 '19 at 06:32
3

That doesn't help with getting `b"3"` from `3`, as the question asks. (It'll give `b"\x03"`.) – gsnedders May 22 '19 at 14:29
1

Might be worth pointing out that both ``to_bytes`` and ``from_bytes`` support a ``signed`` argument. This allows storing both positive and negative numbers, at the cost of an additional bit. – MisterMiyagi Aug 20 '20 at 08:25
(https://stackoverflow.com/a/64502258/5267751 explains what the `+7` is for.) – user202729 Feb 08 '21 at 14:25
Why are the parenthesis needed and where can I find documentation on them? – young_souvlaki Apr 06 '21 at 19:01
Didn't know there was a X.to_bytes(2, byteorder='big'). And you can set the byte order! HUGE thank you! – boardkeystown Dec 27 '22 at 23:12

score 254 · Answer 2 · edited Dec 19 '22 at 02:06

254

That's the way it was designed - and it makes sense because usually, you would call bytes on an iterable instead of a single integer:

>>> bytes([3])
b'\x03'

The docs state this, as well as the docstring for bytes:

>>> help(bytes)
...
bytes(int) -> bytes object of size given by the parameter initialized with null bytes

edited Dec 19 '22 at 02:06

mkrieger1

19,194
5
54
65

answered Jan 09 '14 at 10:37

Tim Pietzcker

328,213
58
503
561

30

Beware that the above works only with python 3. In python 2 `bytes` is just an alias for `str`, which means `bytes([3])` gives you `'[3]'`. – botchniaque Aug 17 '16 at 13:48
1

Creating a sequence first is incredibly slow for a single integer. I wouldn't recommend it. – Justin Turner Arthur Dec 18 '16 at 01:29
19

In Python 3, note that `bytes([n])` only works for int n from 0 to 255. For anything else it raises `ValueError`. – Asclepius Dec 21 '16 at 06:29
11

@A-B-B: Not really surprising since a byte can only store values between 0 and 255. – Tim Pietzcker Dec 21 '16 at 07:15
8

It should also be noted that `bytes([3])` is still different from what the OP wanted – namely the byte value used to encode the digit "3" in ASCII, ie. `bytes([51])`, which is `b'3'`, not `b'\x03'`. – lenz Apr 01 '17 at 21:13
1

This is the wrong answer; it only works for integers 0 <= x < 256. – weberc2 Jun 18 '19 at 16:26
@weberc2: What do you mean? `bytes(500)` works perfectly well, and `bytes([500])` can't work since a byte can only have values between 0 and 255 (see comments above). – Tim Pietzcker Jun 19 '19 at 12:37
2

`bytes(500)` creates a bytestring w/ len == 500. It does not create a bytestring that encodes the integer 500. And I agree that `bytes([500])` can't work, which is why that's the wrong answer too. Probably the right answer is `int.to_bytes()` for versions >= 3.1. – weberc2 Jun 20 '19 at 21:57
For ASCII just do `bytes[n + 48]` for numbers between 0 and 9, or in the example with the number 3, `bytes[3 + 48]`, since the ASCII representation for 0 is `48`. – Kebman Sep 17 '20 at 13:19

score 51 · Answer 3 · answered Nov 14 '14 at 00:25

51

You can use the struct's pack:

In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'

The ">" is the byte-order (big-endian) and the "I" is the format character. So you can be specific if you want to do something else:

In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'

In [13]: struct.pack("B", 1)
Out[13]: '\x01'

This works the same on both python 2 and python 3.

Note: the inverse operation (bytes to int) can be done with unpack.

answered Nov 14 '14 at 00:25

Andy Hayden

359,921
101
625
535

2

@AndyHayden To clarify, since a struct has a standard size irrespective of the input, `I`, `H`, and `B` work till `2**k - 1` where k is 32, 16, and 8 respectively. For larger inputs they raise `struct.error`. – Asclepius Dec 21 '16 at 13:45
Presumably down-voted as it doesn't answer the question: the OP wants to know how to generate `b'3\r\n'`, i.e. a byte-string containing the ASCII character "3" not the ASCII character "\x03" – Dave Jones Mar 05 '17 at 21:17
2

@DaveJones What makes you think that is what the OP wants? The **accepted answer** returns `\x03`, and the solution if you just want `b'3'` is trivial. The reason cited by A-B-B is much more plausible... or at least understandable. – Andy Hayden Mar 05 '17 at 23:32
@DaveJones Also, the reason I added this answer was because Google takes you here when searching to do precisely this. So that's why it's here. – Andy Hayden Mar 05 '17 at 23:36
5

Not only does this work the same in 2 and 3, but it's faster than both the `bytes([x])` and `(x).to_bytes()` methods in Python 3.5. That was unexpected. – Mark Ransom Mar 07 '17 at 17:03

jfs · Accepted Answer · 2022-10-20T06:47:33.143

Python 3.5+ introduces %-interpolation (printf-style formatting) for bytes:

>>> b'%d\r\n' % 3
b'3\r\n'

See PEP 0461 -- Adding % formatting to bytes and bytearray.

On earlier versions, you could use str and .encode('ascii') the result:

>>> s = '%d\r\n' % 3
>>> s.encode('ascii')
b'3\r\n'

Note: It is different from what int.to_bytes produces:

>>> n = 3
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big') or b'\0'
b'\x03'
>>> b'3' == b'\x33' != b'\x03'
True

score 13 · Answer 5 · edited Dec 19 '22 at 02:09

13

The documentation says:

bytes(int) -> bytes object of size given by the parameter
              initialized with null bytes

The sequence:

b'3\r\n'

It is the character '3' (decimal 51) the character '\r' (13) and '\n' (10).

Therefore, the way would treat it as such, for example:

>>> bytes([51, 13, 10])
b'3\r\n'

>>> bytes('3', 'utf8') + b'\r\n'
b'3\r\n'

>>> n = 3
>>> bytes(str(n), 'ascii') + b'\r\n'
b'3\r\n'

Tested on IPython 1.1.0 & Python 3.2.3

edited Dec 19 '22 at 02:09

mkrieger1

19,194
5
54
65

answered Jan 09 '14 at 13:15

Schcriher

913
10
16

1

I ended up doing `bytes(str(n), 'ascii') + b'\r\n'` or `str(n).encode('ascii') + b'\r\n'`. Thanks! :) – astrojuanlu Jan 09 '14 at 14:32
1

@Juanlu001, also `"{}\r\n".format(n).encode()` I don't think there is any harm done by using the default utf8 encoding – John La Rooy Feb 12 '15 at 00:33

Bachsau · Answer 6 · 2017-04-02T11:43:49.777

8

The ASCIIfication of 3 is "\x33" not "\x03"!

That is what python does for str(3) but it would be totally wrong for bytes, as they should be considered arrays of binary data and not be abused as strings.

The most easy way to achieve what you want is bytes((3,)), which is better than bytes([3]) because initializing a list is much more expensive, so never use lists when you can use tuples. You can convert bigger integers by using int.to_bytes(3, "little").

Initializing bytes with a given length makes sense and is the most useful, as they are often used to create some type of buffer for which you need some memory of given size allocated. I often use this when initializing arrays or expanding some file by writing zeros to it.

edited Apr 02 '17 at 11:43

answered Aug 01 '15 at 10:40

Bachsau

1,213
14
21

1

There are several problems with this answer: (a) The escape notation of `b'3'`is `b'\x33'`, not `b'\x32'`. (b) `(3)` is not a tuple – you have to add a comma. (c) The scenario of initialising a sequence with zeroes does not apply to `bytes` objects, as they are immutable (it makes sense for `bytearray`s, though). – lenz Apr 01 '17 at 22:26
Thanks for your comment. I fixed those two obvious mistakes. In case of `bytes` and `bytearray`, I think it's mostly a matter of consistency. But it is also useful if you want to push some zeros into a buffer or file, in which case it is only used as a data source. – Bachsau Apr 02 '17 at 11:50

score 5 · Answer 7 · edited Dec 19 '22 at 02:09

I was curious about performance of various methods for a single int in the range [0, 255], so I decided to do some timing tests.

Based on the timings below, and from the general trend I observed from trying many different values and configurations, struct.pack seems to be the fastest, followed by int.to_bytes, bytes, and with str.encode (unsurprisingly) being the slowest. Note that the results show some more variation than is represented, and int.to_bytes and bytes sometimes switched speed ranking during testing, but struct.pack is clearly the fastest.

Results in CPython 3.7 on Windows:

Testing with 63:
bytes_: 100000 loops, best of 5: 3.3 usec per loop
to_bytes: 100000 loops, best of 5: 2.72 usec per loop
struct_pack: 100000 loops, best of 5: 2.32 usec per loop
chr_encode: 50000 loops, best of 5: 3.66 usec per loop

Test module (named int_to_byte.py):

"""Functions for converting a single int to a bytes object with that int's value."""

import random
import shlex
import struct
import timeit

def bytes_(i):
    """From Tim Pietzcker's answer:
    https://stackoverflow.com/a/21017834/8117067
    """
    return bytes([i])

def to_bytes(i):
    """From brunsgaard's answer:
    https://stackoverflow.com/a/30375198/8117067
    """
    return i.to_bytes(1, byteorder='big')

def struct_pack(i):
    """From Andy Hayden's answer:
    https://stackoverflow.com/a/26920966/8117067
    """
    return struct.pack('B', i)

# Originally, jfs's answer was considered for testing,
# but the result is not identical to the other methods
# https://stackoverflow.com/a/31761722/8117067

def chr_encode(i):
    """Another method, from Quuxplusone's answer here:
    https://codereview.stackexchange.com/a/210789/140921
    
    Similar to g10guang's answer:
    https://stackoverflow.com/a/51558790/8117067
    """
    return chr(i).encode('latin1')

converters = [bytes_, to_bytes, struct_pack, chr_encode]

def one_byte_equality_test():
    """Test that results are identical for ints in the range [0, 255]."""
    for i in range(256):
        results = [c(i) for c in converters]
        # Test that all results are equal
        start = results[0]
        if any(start != b for b in results):
            raise ValueError(results)

def timing_tests(value=None):
    """Test each of the functions with a random int."""
    if value is None:
        # random.randint takes more time than int to byte conversion
        # so it can't be a part of the timeit call
        value = random.randint(0, 255)
    print(f'Testing with {value}:')
    for c in converters:
        print(f'{c.__name__}: ', end='')
        # Uses technique borrowed from https://stackoverflow.com/q/19062202/8117067
        timeit.main(args=shlex.split(
            f"-s 'from int_to_byte import {c.__name__}; value = {value}' " +
            f"'{c.__name__}(value)'"
        ))

@A-B-B As mentioned in my first sentence, I'm only measuring this for a single int in the range `[0, 255]`. I assume by "wrong indicator" you mean my measurements weren't general enough to fit most situations? Or was my measuring methodology poor? If the latter, I would be interested to hear what you have to say, but if the former, I never claimed my measurements were generic to all use-cases. For my (perhaps niche) situation, I am only dealing with ints in the range `[0, 255]`, and that is the audience I intended to address with this answer. Was my answer unclear? I can edit it for clarity... — Graham, Jan 11 '19 at 12:19
What about the technique of just indexing a precomputed encoding for the range? The precomputation wouldn't be subject to timing, only the indexing would be. — Asclepius, Jan 11 '19 at 15:29
@A-B-B That's a good idea. That sounds like it will be faster than anything else. I'll do some timing and add it to this answer when I have some time. — Graham, Jan 11 '19 at 19:03
If you really want to time the bytes-from-iterable thing, you should use `bytes((i,))` instead of `bytes([i])` because list are more complex, use more memory and take long to initialize. In this case, for nothing. — Bachsau, Mar 28 '19 at 06:28

Asclepius · Answer 8 · 2022-10-12T21:00:10.760

5

Although the prior answer by brunsgaard is an efficient encoding, it works only for unsigned integers. This one builds upon it to work for both signed and unsigned integers.

def int_to_bytes(i: int, *, signed: bool = False) -> bytes:
    length = ((i + ((i * signed) < 0)).bit_length() + 7 + signed) // 8
    return i.to_bytes(length, byteorder='big', signed=signed)

def bytes_to_int(b: bytes, *, signed: bool = False) -> int:
    return int.from_bytes(b, byteorder='big', signed=signed)

# Test unsigned:
for i in range(1025):
    assert i == bytes_to_int(int_to_bytes(i))

# Test signed:
for i in range(-1024, 1025):
    assert i == bytes_to_int(int_to_bytes(i, signed=True), signed=True)

For the encoder, (i + ((i * signed) < 0)).bit_length() is used instead of just i.bit_length() because the latter leads to an inefficient encoding of -128, -32768, etc.

Credit: CervEd for fixing a minor inefficiency.

edited Oct 12 '22 at 21:00

answered Jan 11 '19 at 06:29

Asclepius

57,944
17
167
143

`int_to_bytes(-128, signed=True) == (-128).to_bytes(1, byteorder="big", signed=True)` is `False` – CervEd Jun 03 '19 at 14:57
You're not using length 2, you're calculating the bit length of the signed integer, adding 7, and then 1, if it's a signed integer. Finally you convert that into the length in bytes. This yields unexpected results for `-128`, `-32768` etc. – CervEd Jun 03 '19 at 15:15
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/194383/discussion-between-cerved-and-a-b-b). – CervEd Jun 03 '19 at 15:23
This is how you fix it `(i+(signed*i<0)).bit_length()` – CervEd Jun 04 '19 at 06:47

alko · Answer 9 · 2014-01-09T10:45:19.343

From bytes docs:

Accordingly, constructor arguments are interpreted as for bytearray().

Then, from bytearray docs:

The optional source parameter can be used to initialize the array in a few different ways:

If it is an integer, the array will have that size and will be initialized with null bytes.

Note, that differs from 2.x (where x >= 6) behavior, where bytes is simply str:

>>> bytes is str
True

PEP 3112:

The 2.6 str differs from 3.0’s bytes type in various ways; most notably, the constructor is completely different.

score 4 · Answer 10 · answered Jan 09 '14 at 10:44

4

The behaviour comes from the fact that in Python prior to version 3 bytes was just an alias for str. In Python3.x bytes is an immutable version of bytearray - completely new type, not backwards compatible.

answered Jan 09 '14 at 10:44

freakish

54,167
9
132
169

renskiy · Answer 11 · 2017-08-25T11:50:26.397

4

int (including Python2's long) can be converted to bytes using following function:

import codecs

def int2bytes(i):
    hex_value = '{0:x}'.format(i)
    # make length of hex_value a multiple of two
    hex_value = '0' * (len(hex_value) % 2) + hex_value
    return codecs.decode(hex_value, 'hex_codec')

The reverse conversion can be done by another one:

import codecs
import six  # should be installed via 'pip install six'

long = six.integer_types[-1]

def bytes2int(b):
    return long(codecs.encode(b, 'hex_codec'), 16)

Both functions work on both Python2 and Python3.

edited Aug 25 '17 at 11:50

answered Aug 09 '17 at 08:57

renskiy

1,330
1
13
12

'hex_value = '%x' % i' will not work under Python 3.4. You get a TypeError, so you'd have to use hex() instead. – bjmc Aug 23 '17 at 19:42
@bjmc replaced with str.format. This should work on Python 2.6+. – renskiy Aug 25 '17 at 10:15
Thanks, @renskiy. You might want to use 'hex_codec' instead of 'hex' because it seems like 'hex' alias is not available on all Python 3 releases see https://stackoverflow.com/a/12917604/845210 – bjmc Aug 25 '17 at 10:31
@bjmc fixed. Thanks – renskiy Aug 25 '17 at 11:51
This fails on negative integers on python 3.6 – Berserker Nov 14 '18 at 09:57

score 3 · Answer 12 · edited Jun 21 '21 at 09:17

3

As you want to deal with binary representation, the best is to use ctypes.

import ctypes
x = ctypes.c_int(1234)
bytes(x)

You must use the specific integer representation (signed/unsigned and the number of bits: c_uint8, c_int8, c_unit16,...).

edited Jun 21 '21 at 09:17

kennysliding

2,783
1
10
31

answered Jun 19 '21 at 14:17

Moises

47
1

Max Malysh · Answer 13 · 2020-05-06T17:58:36.297

Some answers don't work with large numbers.

Convert integer to the hex representation, then convert it to bytes:

def int_to_bytes(number):
    hrepr = hex(number).replace('0x', '')
    if len(hrepr) % 2 == 1:
        hrepr = '0' + hrepr
    return bytes.fromhex(hrepr)

Result:

>>> int_to_bytes(2**256 - 1)
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

score -1 · Answer 14 · answered Jul 01 '21 at 08:59

-1

I think you can convert the int to str first, before you convert to byte. That should produce the format you want.

bytes(str(your_number),'UTF-8') + b'\r\n'

It works for me in py3.8.

answered Jul 01 '21 at 08:59

astroflyer

187
5

score -2 · Answer 15 · answered Apr 26 '20 at 17:23

-2

If the question is how to convert an integer itself (not its string equivalent) into bytes, I think the robust answer is:

>>> i = 5
>>> i.to_bytes(2, 'big')
b'\x00\x05'
>>> int.from_bytes(i.to_bytes(2, 'big'), byteorder='big')
5

More information on these methods here:

answered Apr 26 '20 at 17:23

Nilashish C

375
2
11

2

How is this different from brunsgaard's answer, posted 5 years ago and currently the highest voted answer? – Arthur Tacca May 06 '20 at 10:30

score -3 · Answer 16 · answered Aug 03 '22 at 11:54

-3

>>> chr(116).encode()
b't'

answered Aug 03 '22 at 11:54

ShaileshKumarMPatel

129
1
6

Why does "bytes(n)" create a length n byte string instead of converting n to a binary representation?

16 Answers16

Linked

Related