1

I'm trying to make some code that allows me to replace a character in a list where the position is defined by an integer. I have the following code

# -*- coding: utf-8 -*-
a = 47
text = 'xxxxxxxxxx xxxxx ╟───┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼───╢ xx'
new = list(text)
new[a] = "x"
print ''.join(new)

But when I run it, it prints out

xxxxxxxxxx xxxxx ╟───┼────┼x▒▒───┼────┼────┼────┼────┼────┼────┼────┼────┼───╢ xx

Instead of

xxxxxxxxxx xxxxx ╟───┼────┼x──┼────┼────┼────┼────┼────┼────┼────┼────┼───╢ xx

In other words it includes the "▒▒" in the printed string. It adds extra characters regardless of which character is replaced in the list. What am I doing wrong?

I'm running it on a raspberry pi connected via SSH using putty.

JamesH
  • 85
  • 2
  • 8
  • 1
    This is a unicode problem with python2. Do you have the possibility to run this with python3? see e.g. https://docs.python.org/2/howto/unicode.html – JohanC Nov 08 '19 at 20:58
  • These "frame" characters are multi-byte ones in UTF-8, and you have Python 2, where `list()` takes the string apart byte-wise (it works in Python 3). Then you overwrite one byte, but the others survive, forming two broken characters. Try if https://stackoverflow.com/questions/8346608/how-to-handle-multibyte-string-in-python helps, at the moment I have no Python 2 to experiment with – tevemadar Nov 08 '19 at 21:06
  • Yep, running with python3 works perfectly (once I updated the print command for python3). Many thanks! – JamesH Nov 08 '19 at 21:06
  • @JohanC That is not a "unicode problem" with python 2. That is lack of unicode when handling text. Also python2 is perfectly capable of running that – JBernardo Nov 08 '19 at 21:13

1 Answers1

2

Since you're using utf-8 encoding for the text (with non-ascii chars), you need to convert it to Unicode string on Python 2 so each char is not split into multiple bytes** before changing to a list.

Better yet, you can directly define the the text as unicode string:

text = u'xxxxxxxxxx xxxxx╟───┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼───╢ xx'

** There are cases where even unicode strings split chars into more than 1 element, but this won't affect you since you're within BMP range. If you want to know more, read about UCS-2 vs UCS-4 representations

JBernardo
  • 32,262
  • 10
  • 90
  • 115