python string decode displayed as byte array in list context

Question

Why is this string printing as a byte array in the context of a list, printing as expected in a print statement, and the type is of string, not bytearray?

stringList = []
# Comparing a string, and a string decoded from a byte array
theString = "Hello World1"
value = byteArray.decode("utf-8") # byteArray is set externally, but prints correctly below

# Types are the same
print("theString type: " + str(type(theString)))
print("value type: " + str(type(value)))

# Value are displayed the same
print("theString: " + theString)
print("value: " + value)

# Add each to list
stringList.append(theString)
stringList.append(value)

# the value string prints as a byte array
print(stringList)

Output:

theString type: <class 'str'>
value type: <class 'str'>
theString: Hello World1
value: Hello World0
['Hello World1', 'H\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00\x00\x00\x00W\x00\x00\x00o\x00\x00\x00r\x00\x00\x00l\x00\x00\x00d\x00\x00\x000\x00\x00\x00']

`u.decodeString()` is not a builtin function - so, more details about what it is would help answer better. — AbdealiLoKo, Jun 11 '23 at 11:14
The variable `value` looks like a string to me, not a bytearray. — quamrana, Jun 11 '23 at 11:16
Thanks - I changed the reference from u.decodeString() to it's UTF-8 decoding. The variable 'value' *is* a string - that's my point. It's really odd that it is rendering as a bytearray. — Joe, Jun 11 '23 at 11:45
the decode() function is found here: https://docs.python.org/3/library/stdtypes.html#bytes.decode — Joe, Jun 11 '23 at 11:46
This is not an MRE (`byteArray` is undefined), and the output you pasted did not result from running this code in any part (the order in which the lines are printed is different, which means that your output came from running some different version of this code). — Samwise, Jun 11 '23 at 17:11

AbdealiLoKo · Accepted Answer · 2023-06-11T12:10:29.243

3

The way python prints things when you do:

print("value: " + value)

and

print(["Hello", value])

is quite different.

The first approach is printing the characters of the string on the console. While the second is not - it is showing the repr of value.

For example:

>>> value = 'H\x00e\x00llo'

>>> print(repr(value))
'H\x00e\x00llo'

>>> print(str(value))             # hidden characters in a string are well ... hidden
Hello

>>> print("value: " + value)      # Adding 2 strings gives a string
value: Hello

>>> print("value:", value)        # value is a string - so hidden characters are not printed
value: Hello

>>> print("value:", repr(value))  # the repr still shows us the hidden characters
value: 'H\x00e\x00llo'

>>> print(['Hello', value])       # list uses repr
['Hello', 'H\x00e\x00llo']

edited Jun 11 '23 at 12:10

answered Jun 11 '23 at 11:13

AbdealiLoKo

3,261
2
20
36

In the example I gave, value is a string. It's being represented as bytearray. This is quite different than what you are showing here. But thanks for fixing my formatting! – Joe Jun 11 '23 at 11:48
2

I have modified it to be more similar to your issue. The underlying concept is the same - that printing a list shows a `repr` of every item in the list. But printing directly or adding with string uses the `str` representation. As a str, `\x00` is a hidden character and hence cannot be seen. But as a repr you can see it. – AbdealiLoKo Jun 11 '23 at 12:06
1

@Joe Is the binary data actually UTF-32 encoded? That might explain the extra 3 zero bytes after every char in the output in the question. – slothrop Jun 11 '23 at 12:19
Yes - good catch. it goes from u1 to u4. I understand AbdealiLoKo's comment better now. – Joe Jun 11 '23 at 12:31
Converting the u4 to u1 works correctly - but I suspect this leading zero problem should be revealed in the conversion to the string. – Joe Jun 11 '23 at 12:52

python string decode displayed as byte array in list context

1 Answers1