I want to read and write unicode characters from a PyQt5 PlainTextEdit.
It has a very weird issue, which only came to light after a bit of trying and it is the following:
If I enter the String:
yóuxiāngdìzhǐ
into the PlainTextEdit and use the method (by clicking on a button):
userInput = self.rightTextEdit.toPlainText()
it gives me the String:
yóuxingdìzhÐ
Which is obviously messed up. However, if I only change the first ó
into an o
it suddenly doesn't have a problem anymore:
input: youxiāngdìzhǐ
after method call: youxiāngdìzhǐ
So I guess Qt5 tries some magic behind the scenes and it fails to guess the encoding (why does it try to guess anyways, wouldn't it be better to require the developer to choose an encoding?). Maybe it only ready some characters, or maybe it thinks the ó
is such an unusual character, that the encoding needs to be changed completely.
Since Qt5 doesn't have any of the QString methods anymore, how am I supposed to tell a PlainTextEdit, that I want the whole thing interpreted as a unicode String?
I read this question: Set Qt default encoding to UTF-8 , but the answer marked as solving the problem only solves it for Qt4, while Qt5 doesn't have the methods anymore.
Here are the important parts of my source code:
from PyQt5.QtCore import *
from PyQt5.QtWidgets import *
...
class PinyinTransformerMainWindow(QMainWindow):
def createControls(self):
...
self.rightTextEdit = QPlainTextEdit('', self)
self.rightTransformButton = QPushButton('Transform (numbers)')
...
def addControlsEventHandlers(self):
self.leftTransformButton.clicked.connect(self.transformToPinyinWithTones)
self.rightTransformButton.clicked.connect(self.transformToPinyinWithNumbers)
def transformToPinyinWithNumbers(self):
userInput = self.rightTextEdit.toPlainText()
print("User input right:", userInput)
...
EDIT #1:
I've written tests like this:
tonedText = "yóuxiāngdìzhǐ"
numberedText = "you2xiang1di4zhi3"
self.assertEquals(self.pinyin_tones_2_numbers_transformer.transform(tonedText), numberedText)
This test uses the transform method which is the same method I am using in the function o which a button click is connected in the PyQt5 GUI and it runs without failing. This means the error must be in the GUI, where I get the String from the PlainTextEdit.
When I enter in a python console:
>>> a = "yóuxiāngdìzhǐ".encode(encoding="utf-8")
>>> a
b'y\xc3\xb3uxi\xc4\x81ngd\xc3\xaczh\xc7\x90'
>>> a.decode()
'yóuxiāngdìzhǐ'
>>> a.decode(encoding="utf-8")
'yóuxiāngdìzhǐ'
So it's not python3 problem. However, if I do this in the code:
self.leftTextEdit.toPlainText().encode('utf-8').decode('utf-8')
I get the wrong String:
yóuxingdìzhÐ
EDIT #2:
I've now added another print() like this:
print("Condition:", self.leftTextEdit.toPlainText().encode('utf-8').decode('utf-8') == "yóuxiāngdìzhǐ")
and then entered
yóuxiāngdìzhǐ
in the PlainTextEdit. This results in:
False
(!) So it really seems like there is an error in the Qt5 interpretation of the String in the PlainTextEdit. What can I do about it?
EDIT 3: Python Version: 3.4 PyQt5 Version: 5.2.1 Locale used: ('en_US', 'UTF-8')