Replacing a character in a python string - odd output with Print

Question

I'm trying to make some code that allows me to replace a character in a list where the position is defined by an integer. I have the following code

# -*- coding: utf-8 -*-
a = 47
text = 'xxxxxxxxxx xxxxx ╟───┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼───╢ xx'
new = list(text)
new[a] = "x"
print ''.join(new)

But when I run it, it prints out

xxxxxxxxxx xxxxx ╟───┼────┼x▒▒───┼────┼────┼────┼────┼────┼────┼────┼────┼───╢ xx

Instead of

xxxxxxxxxx xxxxx ╟───┼────┼x──┼────┼────┼────┼────┼────┼────┼────┼────┼───╢ xx

In other words it includes the "▒▒" in the printed string. It adds extra characters regardless of which character is replaced in the list. What am I doing wrong?

I'm running it on a raspberry pi connected via SSH using putty.

This is a unicode problem with python2. Do you have the possibility to run this with python3? see e.g. https://docs.python.org/2/howto/unicode.html — JohanC, Nov 08 '19 at 20:58
These "frame" characters are multi-byte ones in UTF-8, and you have Python 2, where `list()` takes the string apart byte-wise (it works in Python 3). Then you overwrite one byte, but the others survive, forming two broken characters. Try if https://stackoverflow.com/questions/8346608/how-to-handle-multibyte-string-in-python helps, at the moment I have no Python 2 to experiment with — tevemadar, Nov 08 '19 at 21:06
Yep, running with python3 works perfectly (once I updated the print command for python3). Many thanks! — JamesH, Nov 08 '19 at 21:06
@JohanC That is not a "unicode problem" with python 2. That is lack of unicode when handling text. Also python2 is perfectly capable of running that — JBernardo, Nov 08 '19 at 21:13

JBernardo · Answer 1 · 2019-11-08T21:10:14.253

2

Since you're using utf-8 encoding for the text (with non-ascii chars), you need to convert it to Unicode string on Python 2 so each char is not split into multiple bytes** before changing to a list.

Better yet, you can directly define the the text as unicode string:

text = u'xxxxxxxxxx xxxxx╟───┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼───╢ xx'

** There are cases where even unicode strings split chars into more than 1 element, but this won't affect you since you're within BMP range. If you want to know more, read about UCS-2 vs UCS-4 representations

edited Nov 08 '19 at 21:10

answered Nov 08 '19 at 21:03

JBernardo

32,262
10
90
115

Care to explain why the downvote? This is the bare minimum for that code to work plus I explain why not using unicode on non-ascii strings is bad – JBernardo Nov 08 '19 at 21:11
It wasn't me, unless I clicked accidentally? – JamesH Nov 08 '19 at 21:14

Replacing a character in a python string - odd output with Print

1 Answers1