0

So I have a dataset in a .json file, I have opened it in python then I want to get data from it. I do some replacement over various strings I don't want, like removing HTML formatting etc. THen I write the string sentences out to a file.

The problem is I am getting the unicode error.

I have solved it by using

mystring=mystring.encode('utf-8')

before printing / writing to file, but when I open it in another python script I now get the similar encoding errors when making comparison over strings.

How do I solve this in a somewhat elegant way?

I have found similar issues, but nothing Json related, I read upon it and the json library should implicitly output unicode.

Here is the segment where i Open the json file

with open("JEOPARDY_QUESTIONS1.json") as json_file:
    json_data = json.load(json_file)
    for item in json_data:

Edit: writing to file

result=result.encode('ascii','ignore')
#removing stuff from string
result+='\n'
if i < 40000:
    i+=1
    if i % 1000 == 0:
        print "adding question #:"+str(i)
    #this write is throwing the unicode error when the encoding line is not there
    f1.write(result) # this used to be print >> f1,result ( and '\n' wasn't added

When I am working with the file the next time I get:

UnicodeWarning: Unicode equal comparison failed to convert both arguments     to Unicode - interpreting them as being unequal
KameeCoding
  • 693
  • 2
  • 9
  • 27

0 Answers0