2

Earlier there was a question How can I convert JSON to CSV? and there were lots of answers, however none of them explains how to convert non-latin1 data.

Let's say I have a JSON file like the following:

[
    {"id":123,"FullName":"Иванов Иван Иванович"},
    {"id":124,"FullName":"Петров Петр Петрович"}
]

And I try to use a script like that:

#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-

import sys
import codecs
import json
import unicodecsv as csv

if __name__ == '__main__':
    fin = codecs.open(sys.argv[1], encoding='utf-8')
    data = json.load(fin)
    fin.close()

    with codecs.open('test.csv', encoding='utf-8', mode='wb') as csv_file:
        w = csv.writer(csv_file, encoding='utf-8')
        w.writerow(data[0].keys())  # header row
    
        for row in data:
            w.writerow(row.values())

Which gives me the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 32: ordinal not in range(128)

First of all it is not clear what is there at position 32, but the most interesting question is if there is a way to save UTF-8 encoded strings to CSV file.

Community
  • 1
  • 1
Anthony
  • 12,407
  • 12
  • 64
  • 88

1 Answers1

3

Given test.json (with quotes and commas as seen in comment):

[
    {"id":123,"FullName":"Иванов, \"Иван\" Иванович"},
    {"id":124,"FullName":"Петров Петр Петрович"}
]

This works:

#!/usr/bin/env python2.7

import json
import unicodecsv as csv

with open('test.json','rb') as fin:
    data = json.load(fin)

with open('test.csv','wb') as csv_file:
    w = csv.writer(csv_file, encoding='utf-8-sig')
    w.writerow(data[0].keys())  # header row
    for row in data:
        w.writerow(row.values())

The json module assumes UTF-8 encoding.

unicodecsv takes a file opened in binary mode, and decodes using the encoding specified when writer is instantiated.

utf-8-sig is used if the .CSV will be opened in Excel, but utf8 works otherwise.

Output:

FullName,id
"Иванов, ""Иван"" Иванович",123
Петров Петр Петрович,124

In Excel:

Excel display

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251