5

I know that java and python handle bytes differently so I am a little bit confused about how to convert byte[] to python string I have this byte[] in java

{ 118, -86, -46, -63, 100, -69, -30, -102, -82, -44, -40, 92, 0, 98, 36, -94 }

I want to convert it to python string here is how i did it

b=[118, -86, -46, -63, 100, -69, -30, -102, -82, -44, -40, 92, 0, 98, 36, -94]
str=""
for i in b:
    str=str+chr(abs(i))

But I am not really sure if this is the correct way to do it.

Minato
  • 75
  • 2
  • 7
  • 1
    `byte` is a datatype in java that does not correspond to python bytestrings. While you can get a result from this, it is likely meaningless. – Eli Sadoff Dec 03 '16 at 18:44
  • 3
    If those bytes are stored in 2-complement, then you are destroying information by using abs() – fafl Dec 03 '16 at 18:46
  • See "[Converting integer to string in Python?](https://stackoverflow.com/questions/961632)" for the minimal case of converting a single Python `int` to a `str`. – Kevin J. Chase Dec 03 '16 at 19:28
  • You were done converting it to Python when you typed `b=[118, ...]`. Your real question is probably more like "How do I convert a Python list to a string?", or maybe "How do I convert many Python integers to a string?". The first is probably either `str(b)` or `repr(b)`; the second is something like `' '.join(str(x) for x in b)`. See also the docs for [`str.join`](https://docs.python.org/3/library/stdtypes.html#str.join). – Kevin J. Chase Dec 03 '16 at 19:28

3 Answers3

4

The Java byte type is a signed integer; the value ranges between -128 and 127. Python's chr expects a value between 0 and 255 instead. From the Primitive Data Types section of the Java tutorial:

byte: The byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive).

You need to convert from 2s compliment to an unsigned integer:

def twoscomplement_to_unsigned(i):
    return i % 256

result = ''.join([chr(twoscomplement_to_unsigned(i)) for i in b])

However, if this is Python 3, you really want to use the bytes type:

result = bytes(map(twoscomplement_to_unsigned, b))
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    can also implement as `i % 256` (in python) – anthony sottile Dec 03 '16 at 18:51
  • Martijn I noticed that you use a list inside the parenthesis when joining. So it's really faster than directly passing the gencomp to `join` right? – Jean-François Fabre Dec 03 '16 at 18:57
  • @Jean-FrançoisFabre: see [List comprehension without \[ \] in Python](//stackoverflow.com/a/9061024); for *`str.join()`* it is faster to pass in a list. – Martijn Pieters Dec 03 '16 at 19:00
  • I already stumbled on it, hence my question. So long for "less signs and more letters". A lot of people would advise to remove them, but `join` creates the list anyway, only slower. BTW it must be the same performance issue with `"".join(map(chr,items))` then? this `map` function is really really useless nowadays. – Jean-François Fabre Dec 03 '16 at 19:01
  • @Jean-FrançoisFabre: yup, `map()` has the same issue here. – Martijn Pieters Dec 03 '16 at 20:02
2

String concatenation is highly inefficient.

I'd recommend to do that in a generator comprehension passed to str.join using an empty separator:

s = "".join([chr(abs(x)) for x in b])

edit: the abs bit is weird. It does what's requested, but nothing useful since byte is signed. So you'd need two's complement as in Martijn answer that fixes the next OP problem: data validity :)

It would be okay if you had some list of ASCII values in a table (and dropping abs allows us to use map, it's so rare to be able to use it let's not deprive us from doing so :)

items = [65, 66, 67, 68]
print("".join(map(chr,items)))

result:

"ABCD"
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
2

Assuming you're using Python 3, bytes can already be initialized from a list. You'll need to convert the signed integers to unsigned bytes first.

items = [118, -86, -46, -63, 100, -69, -30, -102, -82, -44, -40, 92, 0, 98, 36, -94]
data = bytes(b % 256 for b in items)
print(data)  # b'v\xaa\xd2\xc1d\xbb\xe2\x9a\xae\xd4\xd8\\\x00b$\xa2'

If the bytes represent text, decode it afterwards. In your example, they do not represent text encoded to UTF-8, so this would fail.

data = data.decode('utf8')
print(data)
davidism
  • 121,510
  • 29
  • 395
  • 339