167

I am in need of a way to get the binary representation of a string in python. e.g.

st = "hello world"
toBinary(st)

Is there a module of some neat way of doing this?

Mazdak
  • 105,000
  • 18
  • 159
  • 188
user1090614
  • 2,575
  • 6
  • 22
  • 27
  • 12
    What do you expect the output to be, specifically? – NPE Sep 15 '13 at 18:20
  • By "binary", do you mean 0101010 type or the `ord`inal number of each character in (e.g. hex)? – cdarke Sep 15 '13 at 18:23
  • Assuming that you actually mean binary (zeros and ones), do you want a binary representation of each character (8 bits per character) one after another? e.g. h is ascii value 104 would be 01101000 in binary – ChrisProsser Sep 15 '13 at 18:30
  • This question has been answered many times on stackoverflow: http://stackoverflow.com/questions/11599226/how-to-convert-binary-string-to-ascii-string-in-python http://stackoverflow.com/questions/8553310/python-2-5-convert-string-to-binary – 0xcaff Sep 15 '13 at 18:32
  • possible duplicate of [Convert Binary to ASCII and vice versa (Python)](http://stackoverflow.com/questions/7396849/convert-binary-to-ascii-and-vice-versa-python) – jfs Mar 12 '14 at 10:59

9 Answers9

165

Something like this?

>>> st = "hello world"
>>> ' '.join(format(ord(x), 'b') for x in st)
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'

#using `bytearray`
>>> ' '.join(format(x, 'b') for x in bytearray(st, 'utf-8'))
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'
Akshay Pratap Singh
  • 3,197
  • 1
  • 24
  • 33
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • 28
    Or if you want each binary number to be 1 byte: ' '.join(format(ord(i),'b').zfill(8) for i in st) – ChrisProsser Sep 15 '13 at 18:39
  • 10
    For full bytes you can also use `' '.join('{0:08b}'.format(ord(x), 'b') for x in st)`, which is about 35% faster than the `zfill(8)` solution (at least on my machine). – max Jun 11 '15 at 11:12
  • 1
    What about converting more-than-one-byte chars, like `β`, e.g., which seems to me represented by `11001110 10110010` internally? – Sergey Bushmanov Mar 25 '17 at 20:18
  • 2
    I know this was posted long time ago, but what about non-ASCII characters? – Mia Apr 10 '17 at 15:09
  • **Format Specification Mini-Language**: `' '.join('{:08b}'.format(d) for d in bytearray('ß', 'utf-8'))`, output: `'11000011 10011111'`, try other encoding `utf-16`, `utf-32` for **non-ASCII**. – Kuo Aug 26 '20 at 17:50
  • 2
    Is there a way to reconstruct the original string from the bytearray one: 1101000 1100101 1101100 '? – E. Erfan Nov 21 '20 at 00:03
  • Which is the reverse operation? – Simone Nov 16 '22 at 17:22
118

If by binary you mean bytes type, you can just use encode method of the string object that encodes your string as a bytes object using the passed encoding type. You just need to make sure you pass a proper encoding to encode function.

In [9]: "hello world".encode('ascii')                                                                                                                                                                       
Out[9]: b'hello world'

In [10]: byte_obj = "hello world".encode('ascii')                                                                                                                                                           

In [11]: byte_obj                                                                                                                                                                                           
Out[11]: b'hello world'

In [12]: byte_obj[0]                                                                                                                                                                                        
Out[12]: 104

Otherwise, if you want them in form of zeros and ones --binary representation-- as a more pythonic way you can first convert your string to byte array then use bin function within map :

>>> st = "hello world"
>>> map(bin,bytearray(st))
['0b1101000', '0b1100101', '0b1101100', '0b1101100', '0b1101111', '0b100000', '0b1110111', '0b1101111', '0b1110010', '0b1101100', '0b1100100']
 

Or you can join it:

>>> ' '.join(map(bin,bytearray(st)))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'

Note that in python3 you need to specify an encoding for bytearray function :

>>> ' '.join(map(bin,bytearray(st,'utf8')))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'

You can also use binascii module in python 2:

>>> import binascii
>>> bin(int(binascii.hexlify(st),16))
'0b110100001100101011011000110110001101111001000000111011101101111011100100110110001100100'

hexlify return the hexadecimal representation of the binary data then you can convert to int by specifying 16 as its base then convert it to binary with bin.

Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • 7
    Not only this is more pythonic, but this is "more" correct for multi-byte non-ASCII strings. – Sergey Bushmanov Mar 25 '17 at 20:23
  • 2
    Just to note that (at least for the current version `3.7.4`): (1) `bytearray` expects an encoding (not just a string) and (2) `map(bin, ...)` will return the `map` object. For the first point, I use for instance `bob`.encoding('ascii')` as suggested by @Tao. For the second, point, using the `join` method, as in the other examples of @Kasramvd will display the desired result. – Antoine Sep 16 '19 at 12:56
  • 1
    the "hello world".encode('ascii') is perfect – F.Tamy Jan 25 '21 at 08:18
  • This is odd. In python3, I can do `>>> bin(bytearray("g", 'utf8')[0]) # '0b1100111'`. But, I cannot do `>>> bin("g".encode("utf8"))` – Slackware Apr 19 '22 at 01:56
54

We just need to encode it.

'string'.encode('ascii')
Tao
  • 559
  • 4
  • 4
  • 1
    For me (`v3.7.4`), this returns a `bytes` object (with the ascii representations of each byte, if available), and in order to display its binary representation, I need `bin`, e.g. with `' '.join(item[2:] for item in map(bin, 'bob'.encode('ascii')))` (note that `0b` needs to be removed at the beginning of the binary representation of each character). – Antoine Sep 16 '19 at 12:59
16

You can access the code values for the characters in your string using the ord() built-in function. If you then need to format this in binary, the string.format() method will do the job.

a = "test"
print(' '.join(format(ord(x), 'b') for x in a))

(Thanks to Ashwini Chaudhary for posting that code snippet.)

While the above code works in Python 3, this matter gets more complicated if you're assuming any encoding other than UTF-8. In Python 2, strings are byte sequences, and ASCII encoding is assumed by default. In Python 3, strings are assumed to be Unicode, and there's a separate bytes type that acts more like a Python 2 string. If you wish to assume any encoding other than UTF-8, you'll need to specify the encoding.

In Python 3, then, you can do something like this:

a = "test"
a_bytes = bytes(a, "ascii")
print(' '.join(["{0:b}".format(x) for x in a_bytes]))

The differences between UTF-8 and ascii encoding won't be obvious for simple alphanumeric strings, but will become important if you're processing text that includes characters not in the ascii character set.

Mark R. Wilkins
  • 1,282
  • 7
  • 15
8

In Python version 3.6 and above you can use f-string to format result.

str = "hello world"
print(" ".join(f"{ord(i):08b}" for i in str))

01101000 01100101 01101100 01101100 01101111 00100000 01110111 01101111 01110010 01101100 01100100
  • The left side of the colon, ord(i), is the actual object whose value will be formatted and inserted into the output. Using ord() gives you the base-10 code point for a single str character.

  • The right hand side of the colon is the format specifier. 08 means width 8, 0 padded, and the b functions as a sign to output the resulting number in base 2 (binary).

Markus Dutschke
  • 9,341
  • 4
  • 63
  • 58
Vlad Bezden
  • 83,883
  • 25
  • 248
  • 179
3
def method_a(sample_string):
    binary = ' '.join(format(ord(x), 'b') for x in sample_string)

def method_b(sample_string):
    binary = ' '.join(map(bin,bytearray(sample_string,encoding='utf-8')))


if __name__ == '__main__':

    from timeit import timeit

    sample_string = 'Convert this ascii strong to binary.'

    print(
        timeit(f'method_a("{sample_string}")',setup='from __main__ import method_a'),
        timeit(f'method_b("{sample_string}")',setup='from __main__ import method_b')
    )

# 9.564299999998184 2.943955828988692

method_b is substantially more efficient at converting to a byte array because it makes low level function calls instead of manually transforming every character to an integer, and then converting that integer into its binary value.

Ben
  • 2,122
  • 2
  • 28
  • 48
2

This is an update for the existing answers which used bytearray() and can not work that way anymore:

>>> st = "hello world"
>>> map(bin, bytearray(st))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding

Because, as explained in the link above, if the source is a string, you must also give the encoding:

>>> map(bin, bytearray(st, encoding='utf-8'))
<map object at 0x7f14dfb1ff28>
Billal Begueradj
  • 20,717
  • 43
  • 112
  • 130
0
''.join(format(i, 'b') for i in bytearray(str, encoding='utf-8'))

This works okay since its easy to now revert back to the string as no zeros will be added to reach the 8 bits to form a byte hence easy to revert to string to avoid complexity of removing the zeros added.

-2
a = list(input("Enter a string\t: "))
def fun(a):
    c =' '.join(['0'*(8-len(bin(ord(i))[2:]))+(bin(ord(i))[2:]) for i in a])
    return c
print(fun(a))
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
  • 1
    Would you like to augment this unreadable code-only answer with some explanation? That would help fighting the misconception that StackOverflow is a free code writing service. In case you want to improve readability, try the info provided here: https://stackoverflow.com/editing-help – Yunnosch Jul 30 '19 at 19:55