45

I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:

>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>

This seems unintuitive to me, as I was expecting to get b'j'.

I have discovered that I can get the value I expect, but it feels like a hack to me.

>>> my_bytes[0:1]
b'j'

Can someone please explain why this happens?

meshy
  • 8,470
  • 9
  • 51
  • 73
  • 5
    The hack of using a range like `my_bytes[0:1]` really helped me write Python2/Python3 compatible code. I'd love to see an answer that covers the best practice for compatible code addressing this issue. For example: `ord(my_bytes[0])` gives an int in Python2, yet `my_bytes[0]` gives an int in Python3. To work in both, I'm using `ord(my_bytes[0:1])` which seems really ugly for Python3. – proximous Nov 15 '16 at 22:23
  • you answer helped me, I couldn't find the best approach to work with bytes and avoid the integer conversion when accessing an index, thanks. – Bersan May 25 '20 at 12:39
  • I noticed the same phenomena with lists made from bytearray and bytestring. `type(list(b'abctest').pop(0))` give ``. `type(list(bytearray(b'abctest')).pop(0))` give ``. `type(bytearray(b'abctest').pop(0))` give ``. – Valentin Stoykov Oct 28 '20 at 17:09

1 Answers1

36

The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.

From the documentation:

Bytes objects are immutable sequences of single bytes.

[...]

While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]

[...]

Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).

Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.

This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • "while b[0:1] will be a bytes object of length 1(This contrasts with text strings, where both indexing and slicing will produce a string of length 1)"can you please explain this sentence i didn't get it? – CY5 Jan 31 '15 at 08:28
  • @CY5: sorry, what part did you get? If you create a (Unicode) string, `'abc'[0]'` produces another string object `'a'`. If you use the same slice as the example used for the `bytes` object, `'abc'[0:1]` also produces a string object of length one, `'a'`. – Martijn Pieters Jan 31 '15 at 08:30
  • 2
    @CY5: but for a `bytes` object, `b'abc'[0]` produces an integer (`97`), and slicing produces a `bytes` object of length one (`b'abc'[0:1]` produces `b'a'`). – Martijn Pieters Jan 31 '15 at 08:31