0

The problem rises when I write an unicode character into a file then try to decode the character after read back. The unicode character 13 is written into a file. However when the character is read from the file, the character is interpreted as 10. The code and the result are shown below. Other unicode characters don't seem to have this problem.

# write unicode character 13 into file  
a = chr(13)  
ftest = open("test13.txt","w", encoding='utf-8')  
ftest.write(a)  
ftest.close()

# Read the character in file in binary format  
print("file in binary:")  
f1 = open("test13.txt", "rb")  
print(ord(f1.read(1)))     

# read the character in file as text   
print("file in text:")  
f1 = open("test13.txt", "r", encoding='utf-8')  
print(ord(f1.read(1)))    

# convert character without file  
print("directly convert without write to file:")  
b = ord(a)  
print(b)  

Result:

file in binary:  
13  
file in text:  
10  
directly convert without write to file:  
13  
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • This has nothing with Unicode to do. Python normalizes line endings when you write a file in text mode. – tripleee Mar 23 '21 at 05:09
  • So how should I correctly write chr(13) to file or read chr(13) from file? – supersheep666 Mar 23 '21 at 05:28
  • If you absolutely want control over the bytes in the file, open it in `wb` mode. But then you need to write `bytes` objects, not strings. – tripleee Mar 23 '21 at 05:33
  • I added a new answer to the duplicate with hopefully enough information to help you see how this works. – tripleee Mar 23 '21 at 05:45
  • If you don't want Python to translate newlines, but still work with text mode, open the file with `open(..., newline="")`. This is something you are encouraged to do when working with CSV files, for example, where leaving newlines can be essential. See [here](https://docs.python.org/3/library/functions.html#open) (you need to scroll a bit). – lenz Mar 23 '21 at 18:10
  • Thanks for all the help. I am able to solve this problem by using open(..., newline=''). I think the problem is that unicode chr(13) is '\r'. When python is using universal newline mode, '\r' is converted to '\n', which is chr(10). – supersheep666 Mar 24 '21 at 23:49

0 Answers0