So I have a dataset in a .json file, I have opened it in python then I want to get data from it. I do some replacement over various strings I don't want, like removing HTML formatting etc. THen I write the string sentences out to a file.
The problem is I am getting the unicode error.
I have solved it by using
mystring=mystring.encode('utf-8')
before printing / writing to file, but when I open it in another python script I now get the similar encoding errors when making comparison over strings.
How do I solve this in a somewhat elegant way?
I have found similar issues, but nothing Json related, I read upon it and the json library should implicitly output unicode.
Here is the segment where i Open the json file
with open("JEOPARDY_QUESTIONS1.json") as json_file:
json_data = json.load(json_file)
for item in json_data:
Edit: writing to file
result=result.encode('ascii','ignore')
#removing stuff from string
result+='\n'
if i < 40000:
i+=1
if i % 1000 == 0:
print "adding question #:"+str(i)
#this write is throwing the unicode error when the encoding line is not there
f1.write(result) # this used to be print >> f1,result ( and '\n' wasn't added
When I am working with the file the next time I get:
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal