0

I am trying to convert a CSV file to a json file. During that process, when i try to write to the json file, i am getting an error halfway about a unicode error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u06ec' in position 933: ordinal not in range(128)

my code:

import csv
import json
import codecs


csvfile = codecs.open('my.csv', 'r', encoding='utf-8', errors='ignore')
jsonfile = codecs.open('my.json',"w", encoding='utf-8',errors='ignore')

fieldnames = ("Title","Date","Text","Country","Page","Week")
reader = csv.DictReader(csvfile, fieldnames)
for row in reader:
    row['Text'] = row['Text'].encode('ascii',errors='ignore') #error occur on this line

    json.dump(row, jsonfile)
    jsonfile.write('\n')

example of a row:

{'Country': 'UK', 'Title': '12345', 'Text': "  hi there  hi john i currently ", 'Week': 'week2', 'Page': 'homepage', 'Date': '1/3/16'}
jxn
  • 7,685
  • 28
  • 90
  • 172
  • Are you using Python 2 or Python 3? And what line is raising that error? – roeland Jan 26 '16 at 20:49
  • im using python 2.7, error occur on this line `row['Text'] = row['Text'].encode('ascii',errors='ignore'`. Also im useing codecs not CSV – jxn Jan 26 '16 at 21:44
  • 2
    CSV is a *binary* format, and in Python 2 you have to open the file as binary (see [example](https://docs.python.org/2/library/csv.html#csv.reader) ). If I run your example it crashes because of this, one line above your comment (while calling `next` during iteration) – roeland Jan 26 '16 at 21:55
  • JSON is also a stored as a binary format. Neither file should be opened with an `encoding`. – bobince Jan 26 '16 at 22:20

2 Answers2

3

Don't convert to ASCII.

JSON handles unicode natively. Simply remove .encode("ascii", ...) part.

Also, you don't need to have encoding set on file object you use for JSON, because JSON already serialises unicode correctly.

Dima Tisnek
  • 11,241
  • 4
  • 68
  • 120
0

Edited my code to read the CSV file as binary. It then gave me another issue of invalid byte of which i solved by transforming the text string to unicode:

This is the working code:

csvfile = open('my.csv', 'rb')
jsonfile = codecs.open('my.json',"w")

fieldnames = ("Title","Date","Text","Country","Page","Week")
reader = csv.DictReader(csvfile, fieldnames)
for row in reader:
    print row
    row['Text'] = unicode(row['Text'],errors='replace')
jxn
  • 7,685
  • 28
  • 90
  • 172