0

I have a script I'm writing where I need to print the character sequence "Qä" to the terminal. My terminal is using UTF-8 encoding. My file has # -*- coding: utf-8 -*- at the top of it, which I think is not actually necessary for Python 3, but I put it there in case it made any difference. In the code, I have something like

print("...Qä...")

This does not produce Qä. Instead it produces Q▒.

I then tried

qa = "Qä".encode('utf-8')
print(f"...{qa}...")

This also does not produce Qä. It produces 'Q\xc3\xa4'.

I also tried

qa = u"Qä"
print(f"...{qa}...")

This also produces Q▒.

However, I know that Python 3 can open files that contain UTF-8 and use the contents properly, so I created a file called qa.txt, pasted Qä into it, and then used

with open("qa.txt") as qa_file:
    qa = qa_file.read().strip()
print(f"...{qa}...")

This works. However, it's beyond dumb that I have to create this file in order to print this string. How can I put this text into my code as a string literal?

This question is NOT a duplicate of a question asking about Python 2.7, I am not using Python 2.7.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
faiuwle
  • 359
  • 1
  • 3
  • 10
  • @Barmar: That dupe target was specifically about Python 2. This is a Python 3 question. – user2357112 Jul 27 '23 at 03:26
  • I suspect this is actually a terminal emulator issue. Your first code works for me in a Mac Terminal window. – Barmar Jul 27 '23 at 03:28
  • My terminal can print this character perfectly well, as shown by the fact that reading it out of the file works. – faiuwle Jul 27 '23 at 03:30
  • 2
    Are `sys.stdout.encoding` and `sys.getdefaultencoding()` both "utf-8"? – tdelaney Jul 27 '23 at 03:31
  • Oh, that's interesting, `sys.getdefaultencoding()` is utf-8, but `sys.stdout.encoding` is cp1252 for some reason. – faiuwle Jul 27 '23 at 03:34
  • Is this a .py file saved to disk? I wonder if its something funky with your editor. If you had `test.py` with the single line `print("ä")`, does it fail. And if you read it as `open('test.py', 'rb').read()`, does that ä encode properly? – tdelaney Jul 27 '23 at 03:34
  • Yeah, what version of Windows? I'm not sure how you get to a utf-8 console. I don't use Windows often, but I think it switched at some point. – tdelaney Jul 27 '23 at 03:34
  • 1
    Windows 10. I'm using Git Bash for the console, it has an options menu where you can set the encoding, and I've confirmed that it is set to UTF-8. A new file with just `print("ä")` also doesn't work. – faiuwle Jul 27 '23 at 03:36
  • 1
    `sys.stdout.reconfigure(encoding='utf-8')` might help. https://docs.python.org/3/library/io.html#io.TextIOWrapper.reconfigure I'm not sure what's causing this situation in the first place, though. – user2357112 Jul 27 '23 at 03:43
  • @user2357112, that did it, thanks! If you post it as an answer I can mark it as the solution. – faiuwle Jul 27 '23 at 03:46

1 Answers1

3

You're using Git Bash, on Windows. On Windows, except if stdio is connected to a standard Windows console (which I don't think Git Bash counts as), Python defaults the standard streams to a locale encoding of 'cp1252'. Your terminal is set to expect UTF-8, not CP1252. You can reconfigure the standard output stream to UTF-8 with

sys.stdout.reconfigure(encoding='utf-8')

and similarly for stdin and stderr, or you can set the PYTHONIOENCODING environment variable to utf-8 before running Python to change the default stdin/stdout/stderr encodings.

user2357112
  • 260,549
  • 28
  • 431
  • 505