16

I have searched many times online and I have not been able to find a way to convert my binary string variable, X

X = "1000100100010110001101000001101010110011001010100"

into a UTF-8 string value.

I have found that some people are using methods such as

b'message'.decode('utf-8')

however, this method has not worked for me, as 'b' is said to be nonexistent, and I am not sure how to replace the 'message' with a variable. Not only, but I have not been able to comprehend how this method works. Is there a better alternative?

So how could I convert a binary string into a text string?

EDIT: I also do not mind ASCII decoding

CLARIFICATION: Here is specifically what I would like to happen.

def binaryToText(z):
    # Some code to convert binary to text
    return (something here);
X="0110100001101001"
print binaryToText(X)

This would then yield the string...

hi
vvvvv
  • 25,404
  • 19
  • 49
  • 81
Dan
  • 324
  • 1
  • 3
  • 13
  • Since ASCII is effectively a subset of UTF-8 you'll find that your string `X` is already a UTF8 string. What is your expected output? – mhawke Nov 11 '16 at 22:43
  • +mhawke I am looking for a returned value of a UTF-8 string. The binary is initially a string, and I want to be able to convert that binary, into a UTF-8 string. Please ask me if you need more clarification! – Dan Nov 11 '16 at 22:46
  • Are you using Python 2 or 3? Why did you tag BOTH? In Python 3, strings are utf by default. – juanpa.arrivillaga Nov 11 '16 at 22:48
  • +juanpa.arrivillaga I have the flexibility to use both, dependant upon which option is best for me to use. I can accept solutions for both versions. – Dan Nov 11 '16 at 22:50
  • Well, if you use Python 3, all strings are unicode, so that seems to be the most straightforward solution... – juanpa.arrivillaga Nov 11 '16 at 22:57
  • @Dan: Again, what is your expected output? Could you write down _exactly_ what you expect to see and add it to your question? – mhawke Nov 12 '16 at 01:58
  • Sure! Done. @mhawke – Dan Nov 12 '16 at 02:08

5 Answers5

17

It looks like you are trying to decode ASCII characters from a binary string representation (bit string) of each character.

You can take each block of eight characters (a byte), convert that to an integer, and then convert that to a character with chr():

>>> X = "0110100001101001"
>>> print(chr(int(X[:8], 2)))
h
>>> print(chr(int(X[8:], 2)))
i

Assuming that the values encoded in the string are ASCII this will give you the characters. You can generalise it like this:

def decode_binary_string(s):
    return ''.join(chr(int(s[i*8:i*8+8],2)) for i in range(len(s)//8))

>>> decode_binary_string(X)
hi

If you want to keep it in the original encoding you don't need to decode any further. Usually you would convert the incoming string into a Python unicode string and that can be done like this (Python 2):

def decode_binary_string(s, encoding='UTF-8'):
    byte_string = ''.join(chr(int(s[i*8:i*8+8],2)) for i in range(len(s)//8))
    return byte_string.decode(encoding)
mhawke
  • 84,695
  • 9
  • 117
  • 138
  • Could you also add the reverse code? For converting string to binary. That would be great :) – Dan Nov 12 '16 at 07:17
  • @Dan: `''.join([bin(ord(c))[2:].rjust(8,'0') for c in 'hi'])` – mhawke Nov 12 '16 at 10:53
  • 2
    I'm way, way late to this solution but I'm curious. When I run the last of the code snippets above I get `'str' object has no attribute 'decode'`. I bring this up because this solution appears perfect for what I need but the encoding (or rather decoding) part doesn't seem to work. – Jeff Nyman Oct 13 '19 at 10:38
5

To convert bits given as a "01"-string (binary digits) into the corresponding text in Python 3:

>>> bits = "0110100001101001"
>>> n = int(bits, 2)
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big').decode()
'hi'

For Python 2/3 solution, see Convert binary to ASCII and vice versa.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
1

In Python 2, an ascii-encoded (byte) string is also a utf8-encoded (byte) string. In Python 3, a (unicode) string must be encoded to utf8-encoded bytes. The decoding example was going the wrong way.

>>> X = "1000100100010110001101000001101010110011001010100"
>>> X.encode()
b'1000100100010110001101000001101010110011001010100'

Strings containing only the digits '0' and '1' are a special case and the same rules apply.

Terry Jan Reedy
  • 18,414
  • 3
  • 40
  • 52
0

Provide the optional base argument to int to convert:

>> x = "1000100100010110001101000001101010110011001010100"
>> int(x, 2)
301456912901716
souldeux
  • 3,615
  • 3
  • 23
  • 35
-1

A working code for python 3

Binstr = '00011001 00001000'
Binstr.split(' ')
s = []
for i in Binstr:
    s.append(chr(i))
print(''.join(s))
LeopardShark
  • 3,820
  • 2
  • 19
  • 33