How to convert java byte[] to python string?

Question

I know that java and python handle bytes differently so I am a little bit confused about how to convert byte[] to python string I have this byte[] in java

{ 118, -86, -46, -63, 100, -69, -30, -102, -82, -44, -40, 92, 0, 98, 36, -94 }

I want to convert it to python string here is how i did it

b=[118, -86, -46, -63, 100, -69, -30, -102, -82, -44, -40, 92, 0, 98, 36, -94]
str=""
for i in b:
    str=str+chr(abs(i))

But I am not really sure if this is the correct way to do it.

`byte` is a datatype in java that does not correspond to python bytestrings. While you can get a result from this, it is likely meaningless. — Eli Sadoff, Dec 03 '16 at 18:44
If those bytes are stored in 2-complement, then you are destroying information by using abs() — fafl, Dec 03 '16 at 18:46
See "[Converting integer to string in Python?](https://stackoverflow.com/questions/961632)" for the minimal case of converting a single Python `int` to a `str`. — Kevin J. Chase, Dec 03 '16 at 19:28
You were done converting it to Python when you typed `b=[118, ...]`. Your real question is probably more like "How do I convert a Python list to a string?", or maybe "How do I convert many Python integers to a string?". The first is probably either `str(b)` or `repr(b)`; the second is something like `' '.join(str(x) for x in b)`. See also the docs for [`str.join`](https://docs.python.org/3/library/stdtypes.html#str.join). — Kevin J. Chase, Dec 03 '16 at 19:28

Martijn Pieters · Accepted Answer · 2016-12-03T18:52:40.283

4

The Java byte type is a signed integer; the value ranges between -128 and 127. Python's chr expects a value between 0 and 255 instead. From the Primitive Data Types section of the Java tutorial:

byte: The byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive).

You need to convert from 2s compliment to an unsigned integer:

def twoscomplement_to_unsigned(i):
    return i % 256

result = ''.join([chr(twoscomplement_to_unsigned(i)) for i in b])

However, if this is Python 3, you really want to use the bytes type:

result = bytes(map(twoscomplement_to_unsigned, b))

edited Dec 03 '16 at 18:52

answered Dec 03 '16 at 18:47

Martijn Pieters

1,048,767
296
4,058
3,343

1

can also implement as `i % 256` (in python) – anthony sottile Dec 03 '16 at 18:51
Martijn I noticed that you use a list inside the parenthesis when joining. So it's really faster than directly passing the gencomp to `join` right? – Jean-François Fabre Dec 03 '16 at 18:57
@Jean-FrançoisFabre: see [List comprehension without \[ \] in Python](//stackoverflow.com/a/9061024); for *`str.join()`* it is faster to pass in a list. – Martijn Pieters Dec 03 '16 at 19:00
I already stumbled on it, hence my question. So long for "less signs and more letters". A lot of people would advise to remove them, but `join` creates the list anyway, only slower. BTW it must be the same performance issue with `"".join(map(chr,items))` then? this `map` function is really really useless nowadays. – Jean-François Fabre Dec 03 '16 at 19:01
@Jean-FrançoisFabre: yup, `map()` has the same issue here. – Martijn Pieters Dec 03 '16 at 20:02

Jean-François Fabre · Answer 2 · 2016-12-03T19:02:08.630

2

String concatenation is highly inefficient.

I'd recommend to do that in a generator comprehension passed to str.join using an empty separator:

s = "".join([chr(abs(x)) for x in b])

edit: the abs bit is weird. It does what's requested, but nothing useful since byte is signed. So you'd need two's complement as in Martijn answer that fixes the next OP problem: data validity :)

It would be okay if you had some list of ASCII values in a table (and dropping abs allows us to use map, it's so rare to be able to use it let's not deprive us from doing so :)

items = [65, 66, 67, 68]
print("".join(map(chr,items)))

result:

"ABCD"

edited Dec 03 '16 at 19:02

answered Dec 03 '16 at 18:44

Jean-François Fabre

137,073
23
153
219

Would it be possibly more efficient to do `map(lambda x: chr(abs(x)), b)`? – Eli Sadoff Dec 03 '16 at 18:45
I think it's equivalent. But as you say, passing absolute value of a negative byte is not recommended. – Jean-François Fabre Dec 03 '16 at 18:46
@MartijnPieters yes, it does nothing useful. – Jean-François Fabre Dec 03 '16 at 18:49
I got the joke! sorry I just copied the OP flawed code. fixed – Jean-François Fabre Dec 03 '16 at 18:54
@Jean-FrançoisFabre Did you think I was being serious? Sorry, I was just kidding :) I was just trying to remind you. – Christian Dean Dec 03 '16 at 18:55
no need to be sorry, I like jokes and I got that one! but the facts are here: the code was flawed so thanks. Don't want to leave upvoted flawed stuff on that site (recently deleted 2 answers with a +3 score. It hurts but that's life). – Jean-François Fabre Dec 03 '16 at 18:58

davidism · Answer 3 · 2016-12-03T18:53:07.493

Assuming you're using Python 3, bytes can already be initialized from a list. You'll need to convert the signed integers to unsigned bytes first.

items = [118, -86, -46, -63, 100, -69, -30, -102, -82, -44, -40, 92, 0, 98, 36, -94]
data = bytes(b % 256 for b in items)
print(data)  # b'v\xaa\xd2\xc1d\xbb\xe2\x9a\xae\xd4\xd8\\\x00b$\xa2'

If the bytes represent text, decode it afterwards. In your example, they do not represent text encoded to UTF-8, so this would fail.

data = data.decode('utf8')
print(data)

How to convert java byte[] to python string?

3 Answers3