0

I'm trying to convert a csv into a json but cannot quite figure out how to get the special letters right, that are based on german alphabet.

Here is the result:

[
    {
        "\ufeffMeter-id": "W000001",
        "Address": "Groninger Stra\u00dfe 22 , 13347 Berlin",
        "January": "",
        "February": "",
        "March": "",
        "April": "",
        "May": "",
        "June": "",
        "July": "",
        "August": "",
        "September": "",
        "October": "",
        "November": "",
        "December": ""
    },
    {
        "\ufeffMeter-id": "G000002",
        "Address": "Oraniendamm 10-6 , 13469 Berlin",
        "January": "767,410.80",
        "February": "784,932.700",
        "March": "797,636.90",
        "April": "812,111.000",
        "May": "819,512.30",
        "June": "820,482.200",
        "July": "820,482.20",
        "August": "820,482.200",
        "September": "820,869.80",
        "October": "826,243.900",
        "November": "834,028.20",
        "December": ""
    },...

Based on this csv:

Meter-id,Address,January,February,March,April,May,June,July,August,September,October,November,December
W000001,"Groninger Straße 22 , 13347 Berlin",,,,,,,,,,,,
G000002,"Oraniendamm 10-6 , 13469 Berlin","767,410.80","784,932.700","797,636.90","812,111.000","819,512.30","820,482.200","820,482.20","820,482.200","820,869.80","826,243.900","834,028.20",

My parsing code looks like this:

import csv
import json

csvfile = '../csv_files/metering-data.csv'
jsonfile = '../json_files/metering-data.json'

jsonArray = []

# convert csv to dict
with open(csvfile, encoding='utf-8') as csvf:
  csvReader = csv.DictReader(csvf)

for row in csvReader:
  jsonArray.append(row)

# convert dict to json file
with open(jsonfile, 'w', encoding='utf-8') as jsonf:
  jsonString = json.dumps(jsonArray, indent=4)
  jsonf.write(jsonString)

What am I doing wrong here?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
aerioeus
  • 1,348
  • 1
  • 16
  • 41
  • I don't think you are doing anything wrong. – mkrieger1 Jan 04 '22 at 16:12
  • Does this answer your question? [Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence](https://stackoverflow.com/questions/18337407/saving-utf-8-texts-with-json-dumps-as-utf8-not-as-u-escape-sequence) – mkrieger1 Jan 04 '22 at 16:15

1 Answers1

1

The unicode character U+00DF is the LATIN SMALL LETTER SHARP S: ß. And it is correctly represented in your json file as \u00df. Your only problem is that the csv file contains an UTF-8 Byte Order Mark, and that is the reason why the first field name starts with \ufeff. You should use the special utf_8_sig encoding to remove it automatically:

...
with open(csvfile, encoding='utf_8_sig') as csvf:
  csvReader = csv.DictReader(csvf)
...
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252