0

When converting utf-8 encoded text to html using markdown2.py, the result is ASCII encoded. How can I tell markdown2 to render utf-8?

Sample input file called ff.md:

Hallo, Bjørn Nößflögl

transformed using:

C:\Python37\python.exe C:\Python37\Scripts\markdown2.py ff.md

When I open the result in an editor (Notepad++), it thinks it's ASCII. Likewise when I render it using flask's render_template it crashes on the diacritics. (UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 12: invalid start byte)

If I manually convert to utf-8 with Notepad++, everything's fine.

Glancing briefly over the markdown2 code, I can see input when read is assumed to be utf-8. So I don't understand why it's not rendered as such.

RolfBly
  • 3,612
  • 5
  • 32
  • 46
  • @snakecharmerb Thanks for the suggestion, however in the console first `chcp 65001` and then converting the file still renders it as ASCII. The ö comes becomes `\xf6` (ASCII), not '\xc3\xb6'. I guess I'll have to dive into markdown2.py if I get tired of the workaround. – RolfBly Dec 28 '18 at 11:41

1 Answers1

1

TL;DR Set the system variable PYTHONIOENCODING to utf-8.

Explanation: To do that in Windows 10, hit the Windows key, type environment and launch Edit environment variables for your account. Click New and set Name of to PYTHONIOENCODING and Value of to utf-8, note no quotes. Save. Doing it this way will make Windows retain that setting.

In the Markdown2 code, these lines write the file in my case (Python 3).

if py3:
    sys.stdout.write(html)

Googling 'python sys.stdout.write utf8' led me to this question on SO, where this answer led me to the solution.

RolfBly
  • 3,612
  • 5
  • 32
  • 46