1356

Apparently, the following is the valid syntax:

b'The string'

I would like to know:

  1. What does this b character in front of the string mean?
  2. What are the effects of using it?
  3. What are appropriate situations to use it?

I found a related question right here on SO, but that question is about PHP though, and it states the b is used to indicate the string is binary, as opposed to Unicode, which was needed for code to be compatible from version of PHP < 6, when migrating to PHP 6. I don't think this applies to Python.

I did find this documentation on the Python site about using a u character in the same syntax to specify a string as Unicode. Unfortunately, it doesn't mention the b character anywhere in that document.

Also, just out of curiosity, are there more symbols than the b and u that do other things?

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
Jesse Webb
  • 43,135
  • 27
  • 106
  • 143
  • 4
    For the curiosity part, since python 3.6 there are the f-strings which are really useful. You can do: v = "world" print(f"Hello {v}") getting "Hello world". Another example is f"{2 * 5}" which gives you "10". It is the way forward when working with strings. – thanos.a Mar 23 '21 at 09:13
  • 2
    f-Strings also have a handy debugging feature if you add an equals (=) sign after the variable but before the closing brace, so f'{v=}' would output "v=123" as the string, showing the name of whatever is being printed. Even for expressions, so f'{2*5=}' would print out "2*5=10" – diamondsea Apr 13 '22 at 17:22
  • 1
    @diamondsea that feature was introduced in version 3.8 – AcK Apr 16 '22 at 12:36
  • For the curiosity part: `stringprefix`::= "r" | "u" | "R" | "U" | "f" | "F" | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF" `bytesprefix`::= "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB" [Documentation: String and Bytes literals](https://docs.python.org/3/reference/lexical_analysis.html#literals) – AcK Apr 16 '22 at 12:42
  • @thanos.a this is the way… – Eric Nelson May 06 '22 at 05:53

12 Answers12

1164

Python 3.x makes a clear distinction between the types:

If you're familiar with:

  • Java or C#, think of str as String and bytes as byte[];
  • SQL, think of str as NVARCHAR and bytes as BINARY or BLOB;
  • Windows registry, think of str as REG_SZ and bytes as REG_BINARY.

If you're familiar with C(++), then forget everything you've learned about char and strings, because a character is not a byte. That idea is long obsolete.

You use str when you want to represent text.

print('שלום עולם')

You use bytes when you want to represent low-level binary data like structs.

NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0]

You can encode a str to a bytes object.

>>> '\uFEFF'.encode('UTF-8')
b'\xef\xbb\xbf'

And you can decode a bytes into a str.

>>> b'\xE2\x82\xAC'.decode('UTF-8')
'€'

But you can't freely mix the two types.

>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The b'...' notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers.

>>> b'A' == b'\x41'
True

But I must emphasize, a character is not a byte.

>>> 'A' == b'A'
False

In Python 2.x

Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was:

  • unicode = u'...' literals = sequence of Unicode characters = 3.x str
  • str = '...' literals = sequences of confounded bytes/characters
    • Usually text, encoded in some unspecified encoding.
    • But also used to represent binary data like struct.pack output.

In order to ease the 2.x-to-3.x transition, the b'...' literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes in 3.x) from text strings (which should be str in 3.x). The b prefix does nothing in 2.x, but tells the 2to3 script not to convert it to a Unicode string in 3.x.

So yes, b'...' literals in Python have the same purpose that they do in PHP.

Also, just out of curiosity, are there more symbols than the b and u that do other things?

The r prefix creates a raw string (e.g., r'\t' is a backslash + t instead of a tab), and triple quotes '''...''' or """...""" allow multi-line string literals.

Boris Verkhovskiy
  • 14,854
  • 11
  • 100
  • 103
dan04
  • 87,747
  • 23
  • 163
  • 198
  • 9
    Thanks! I understood it after reading these sentences: "In order to ease the 2.x-to-3.x transition, the b'...' literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes in 3.x) from text strings (which should be str in 3.x). The b prefix does nothing in 2.x, but tells the 2to3 script not to convert it to a Unicode string in 3.x." – tommy.carstensen Sep 08 '13 at 03:46
  • 10
    The `'A' == b'A' --> False` check *really* makes it clear. The rest of it is excellent, but up to that point I hadn't properly understood that a byte string is *not really text.* – Wildcard Sep 19 '16 at 21:30
  • 27
    `'שלום עולם' == 'hello world'` – Eli Aug 21 '17 at 11:22
  • 2
    +1 for the `.decode('UTF-8')`. Was searching for how to change my b' string received over server POST request back to unicode. – Nikhil VJ Mar 15 '18 at 01:33
  • 3
    `A CHARACTER IS NOT A BYTE` is a wrong logical deduction from the C++ draft. C++ never had that kind of "idea". C++ defines a byte as an `addressable unit of data storage large enough to hold any member of the basic character set of the execution environment`. That's like saying a glass can hold water. Every water is a glass. – clickMe Aug 27 '18 at 13:53
  • 10
    b"some string".decode('UTF-8'), I believe that's the line many are looking for – Marvin Thobejane Sep 16 '18 at 12:17
  • 4
    In addition of `u`, `b`, `r`, Python 3.6, introduce f-string for string formatting. Example `f'The temperature is {tmp_value} Celsius'` – Conchylicultor Jan 04 '19 at 13:21
  • 1
    the decode missed parentheses for me. `(b'\xE2\x82\xAC').decode('UTF-8')` worked. – shampoo Apr 23 '19 at 16:36
  • 2
    Can I suggest an edit? `But I must emphasize, a character is not a byte.`. Can you add immediately after that what IS a character? Because a precise definition of what is a character would help so much to understand. – Rafael Eyng Oct 18 '19 at 02:09
  • 1
    @clickMe True - a byte is not an octet of bits. So, technically, a character can be a byte. – Ate Somebits Jan 23 '22 at 12:11
540

To quote the Python 2.x documentation:

A prefix of 'b' or 'B' is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A 'u' or 'b' prefix may be followed by an 'r' prefix.

The Python 3 documentation states:

Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.

anthony sottile
  • 61,815
  • 15
  • 148
  • 207
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 9
    So it sounds like Python < v3 will just ignore this extra character. What would be a case in v3 where you would need to use a b string as opposed to just a regular string? – Jesse Webb Jun 07 '11 at 19:05
  • 7
    @Gweebz - if you're actually typing out a string in a particular encoding instead of with unicode escapes (eg. b'\xff\xfe\xe12' instead of '\u32e1'). – detly Jun 08 '11 at 02:44
  • 8
    Actually, if you've imported `unicode_literals` from `__future__`, this will "reverse" the behavior for this particular string (in Python 2.x) – Romuald Brunet Mar 14 '13 at 16:27
  • 94
    A little more plain language narrative around the quoted documentation would make this a better answer IMHO – Hack-R Jun 03 '17 at 16:24
  • *"**b is for bytes(/ASCII), as opposed to Unicode.** In Python 3.x, strings are now Unicode by default."* do we agree that suggested doc change is better? Also, that 3.x doc quote assumes you already know strings are now Unicode by default, without actually saying that. Also, 2.x is now ancient history, I'd move the 3.x quote above it (and mentions of 2to3 are pretty ancient too). – smci Apr 01 '21 at 19:31
42

The b denotes a byte string.

Bytes are the actual data. Strings are an abstraction.

If you had multi-character string object and you took a single character, it would be a string, and it might be more than 1 byte in size depending on encoding.

If took 1 byte with a byte string, you'd get a single 8-bit value from 0-255 and it might not represent a complete character if those characters due to encoding were > 1 byte.

TBH I'd use strings unless I had some specific low level reason to use bytes.

30

From server side, if we send any response, it will be sent in the form of byte type, so it will appear in the client as b'Response from server'

In order get rid of b'....' simply use below code:

Server file:

stri="Response from server"    
c.send(stri.encode())

Client file:

print(s.recv(1024).decode())

then it will print Response from server

Eliahu Aaron
  • 4,103
  • 5
  • 27
  • 37
Nani Chintha
  • 491
  • 5
  • 3
  • 1
    It doesn't explain the question that Jesse Webb has asked! – Chandra Kanth Aug 29 '18 at 09:14
  • I was saying that without using encode and decode methods, the string output will be prefixed with b' ' as python take it as a byte type instead of string type.If you don't want to get an output like b'...' use the above that's it .What you didn't understand? – Nani Chintha Sep 04 '18 at 05:51
  • Actually this is exactly the answer to *the title* of the question that was asked: Q: "What does b'x' do?" A: "It does 'x'.encode()" That is literally what it does. The rest of the question wanted to know much more than this, but the title is answered. – Michael Erickson May 16 '20 at 01:25
  • 2
    @MichaelErickson no, `b'x'` **does not** "do `'x'.encode()`. It simply creates a value of the same type. If you don't believe me, try evaluating `b'\u1000' == '\u1000'.encode()`. – Karl Knechtel Jan 31 '22 at 15:24
26

The answer to the question is that, it does:

data.encode()

and in order to decode it(remove the b, because sometimes you don't need it)

use:

data.decode()
Marcello DeSales
  • 21,361
  • 14
  • 77
  • 80
Billy
  • 1,157
  • 1
  • 9
  • 18
  • 6
    *This is incorrect*. `bytes` literals are interpreted *at compile time* by a different mechanism; they are **not** syntactic sugar for a `data.encode()` call, a `str` is **not** created in the process, and the interpretation of text within the `""` is **not the same**. In particular, e.g. `b"\u1000"` **does not** create a `bytes` object representing Unicode character `0x1000` in *any meaningful encoding*; it creates a `bytes` object storing numeric values `[92, 117, 49, 48, 48, 48]` - corresponding to a backslash, lowercase u, digit 1, and three digit 0s. – Karl Knechtel Jan 31 '22 at 15:17
14

Here's an example where the absence of b would throw a TypeError exception in Python 3.x

>>> f=open("new", "wb")
>>> f.write("Hello Python!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface

Adding a b prefix would fix the problem.

Eliahu Aaron
  • 4,103
  • 5
  • 27
  • 37
user3053230
  • 199
  • 1
  • 3
13

It turns it into a bytes literal (or str in 2.x), and is valid for 2.6+.

The r prefix causes backslashes to be "uninterpreted" (not ignored, and the difference does matter).

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • This sounds wrong according to the documentation quoted in aix's answer; the b will be ignored in Python version other than 3. – Jesse Webb Jun 07 '11 at 19:06
  • 2
    It will be a `str` in 2.x either way, so it could be said that it is ignored. The distinction matters when you import `unicode_literals` from the `__future__` module. – Ignacio Vazquez-Abrams Jun 07 '11 at 19:16
  • "the b will be ignored in Python version other than 3." It will *have no effect* in 2.x *because in 2.x, `str` names the same type that `bytes` does*. – Karl Knechtel Jan 31 '22 at 15:22
10

In addition to what others have said, note that a single character in unicode can consist of multiple bytes.

The way unicode works is that it took the old ASCII format (7-bit code that looks like 0xxx xxxx) and added multi-bytes sequences where all bytes start with 1 (1xxx xxxx) to represent characters beyond ASCII so that Unicode would be backwards-compatible with ASCII.

>>> len('Öl')  # German word for 'oil' with 2 characters
2
>>> 'Öl'.encode('UTF-8')  # convert str to bytes 
b'\xc3\x96l'
>>> len('Öl'.encode('UTF-8'))  # 3 bytes encode 2 characters !
3
xjcl
  • 12,848
  • 6
  • 67
  • 89
  • This is useful supplementary information, but it does not address the question at all. It should be written as a comment to another answer instead. – Karl Knechtel Jan 31 '22 at 15:21
  • A single character in Unicode does not consist of bytes in the first place. A Unicode character *in a specific encoding* (like UTF-8, UTF-16, UTF-32, or oddball ones like UTF-7) can consist of multiple bytes (for some of those, they're always multiple bytes), but Unicode characters are platonic ideals; they have no inherent byte representation. – ShadowRanger Jul 07 '22 at 14:09
8

b"hello" is not a string (even though it looks like one), but a byte sequence. It is a sequence of 5 numbers, which, if you mapped them to a character table, would look like h e l l o. However the value itself is not a string, Python just has a convenient syntax for defining byte sequences using text characters rather than the numbers itself. This saves you some typing, and also often byte sequences are meant to be interpreted as characters. However, this is not always the case - for example, reading a JPG file will produce a sequence of nonsense letters inside b"..." because JPGs have a non-text structure.

.encode() and .decode() convert between strings and bytes.

Haterind
  • 1,095
  • 1
  • 8
  • 16
5

You can use JSON to convert it to dictionary

import json
data = b'{"key":"value"}'
print(json.loads(data))

{"key":"value"}


FLASK:

This is an example from flask. Run this on terminal line:

import requests
requests.post(url='http://localhost(example)/',json={'key':'value'})

In flask/routes.py

@app.route('/', methods=['POST'])
def api_script_add():
    print(request.data) # --> b'{"hi":"Hello"}'
    print(json.loads(request.data))
return json.loads(request.data)

{'key':'value'}

Karam Qusai
  • 713
  • 12
  • 16
  • This works well (I do the same for JSON data), but will fail for other type of data. If you have a generic `str` data, might be an XML for example, you can assign the variable and decode it. Something like `data = request.data` and then `data = data.decode()` – Andrea Jun 18 '21 at 10:28
  • 1
    This does not answer the question. The question is about what the `b` means, not about what can be done with the object. Also, this can only be done with a very small subset of `bytes` literals, the ones that are formatted to the JSON specification. – Karl Knechtel Jan 31 '22 at 15:20
  • dear @KarlKnechtel It doesn't answer this question directly that is true, but it is good for SEO for Stackoverflow if someone having this issue but isn't able to form the right question but only mentions like b' Flask/Django then this answer will be more relevant for the search engine to put it in front. – Karam Qusai Jan 30 '23 at 15:05
1

bytes(somestring.encode()) is the solution that worked for me in python 3.

def compare_types():
    output = b'sometext'
    print(output)
    print(type(output))


    somestring = 'sometext'
    encoded_string = somestring.encode()
    output = bytes(encoded_string)
    print(output)
    print(type(output))


compare_types()
Severin Spörri
  • 100
  • 1
  • 10
0

Answering question 1 and 2: b means you want to change/make use of the ordinary String type into Byte type. For an example:

>>> type(b'')
<class 'bytes'>
>>> type('')
<class 'str'> 

Answering questions 3: It can be used when we want to check the bytestream (a sequence of bytes) from some file/object. I.e we want to check SHA1 message digest of some file:

import hashlib

def hash_file(filename):
   """"This function returns the SHA-1 hash of the file passed into it"""

   # make a hash object
   h = hashlib.sha1()

   # open file for reading in binary mode
   with open(filename,'rb') as file:

       # loop till the end of the file
       chunk = 0
       while chunk != b'':
           # read only 1024 bytes at a time
           chunk = file.read(1024)
           h.update(chunk)

   # return the hex representation of digest
   return h.hexdigest()

message = hash_file("somefile.pdf")
print(message)
hepidad
  • 1,698
  • 1
  • 15
  • 16