I've already read this:
Setting the correct encoding when piping stdout in Python
And I'm trying to stick with the rule of thumb: "Always use Unicode internally. Decode what you receive, and encode what you send."
So here's my main file:
# coding: utf-8
import os
import sys
from myplugin import MyPlugin
if __name__ == '__main__':
c = MyPlugin()
a = unicode(open('myfile.txt').read().decode('utf8'))
print(c.generate(a).encode('utf8'))
What is getting on my nerves is that:
- I read in a utf8 file so I decode it.
- then I force convert it to unicode which gives
unicode(open('myfile.txt').read().decode('utf8'))
- then I try to output it to a terminal
- on my Linux shell I need to re-encode it to utf8, and I guess this is normal because I'm working all this time on an unicode string, then to output it, I have to re-encode it in utf8 (correct me if I'm wrong here)
- when I run it with Pycharm under Windows, it's twice utf8 encoded, which gives me things like
agréable, déjÃ
. So if I removeencode('utf8')
(which changes the last line toprint(c.generate(a))
then it works with Pycharm, but doesn't work anymore with Linux, where I get:'ascii' codec can't encode character u'\xe9' in position
blabla you know the problem.
If I try in the command line:
- Linux/shell ssh:
import sys sys.stdout.encoding
I get'UTF-8'
- Linux/shell in my code:
import sys sys.stdout.encoding
I getNone
WTF?? - Windows/Pycharm:
import sys sys.stdout.encoding
I get'windows-1252'
What is the best way to code this so it works on both environments?