Character showing up as diamond question mark only at end of line (Python>Text)

Question

I'm working on a Python file that inputs a text file with Japanese characters (UTF-8) in it, takes some of the text, and writes it into a new UTF-8 text file.

The problem I'm coming across is that for some reason whenever the Japanese character だ appears at the end of a line in the original input file, it comes out as a diamond question mark in the output file.

Instances of だ before the end of a line read perfectly fine and the original input file has it reading perfectly fine even if it's at the end of the line.

using python 2.7 or 3.x? python 3.x has much better unicode support — Aaron, Jan 23 '17 at 17:35
here the explanation is for Java but it's relevant here as well http://stackoverflow.com/a/24009294/1530987 — Chandan Rai, Jan 23 '17 at 17:36

score 5 · Answer 1 · answered Jan 23 '17 at 17:48

As you haven't shared any code snippet I would recommend you a generic way of reading and writing utf-8 files using the codecs module as:

# Reading utf-8 encoded file
with codecs.open("in.txt", "r", encoding="utf-8") as input_data:
    data = input_data.read()

# Write utf-8 encoded file
with codecs.open("out.txt", "w", encoding="utf-8") as output_data:
     output_data.write(data)

And BTW I tested it on the given character だ and it works pretty fine.

Character showing up as diamond question mark only at end of line (Python>Text)

1 Answers1

Linked