I was trying to take the content of a text file and map it into a json file, but I noticed that python automatically turned the kurdish(sorani) text into UTF-8 literals. Can someone explain why python does this and how can I prevent the conversion?
You can test it with the code below:
def readText():
# test.txt contains kurdish sorani characters (an article)
# Sorani example: ڕۆژتان باش بەڕێزان. من ناوم ڕەنجە.
with open('test.txt', 'r') as context:
data = context.readlines()
return data
print(readText())
I'm running python 2.x on Ubuntu 14.x. Python2.x does this! Python 3.x does not convert it and works just fine.