0

I'm converting a csv file with multiple records to a file with a json array. Solution is How to find last line in csv file when using DictReader. This works fine, except that resulting json in file has backslash (\) and double-quotes; i.e. it's escaping the strings. For example:

{"Test Name": "Basic Tcp IPerf Test", " \"Status\"": "PASS", " \"Test Start Time\"": "20190103 08:07:41.662", " \"Test End Time\"": "20190103 08:08:44.051", " \"Throughput\"": "2095.2746", " \"Jitter\"": "", " \"Packet Loss %\"": "", " \"Compute/VM\"": "SendVm (x.x.x.x)", " \"Statistics Sampling Start Time\"": "03 Jan 2019 08:07:44", " \"Statistics Sampling End Time\"": "03 Jan 2019 08:08:42", " \"Min CPU\"": "0", " \"Max CPU\"": "2", " \"Avg CPU\"": "1", " \"Avg Memory (MB)\" ": "7758"}

Is there a way to write out that json without that escaping? Code:

csvfile = open('fiile.csv','r')
jsonfile = open('file.json','w')

reader = csv.DictReader(csvfile)

jsonfile.write(json.dumps(list(reader)))
timsterc
  • 963
  • 3
  • 10
  • 18
  • If you just do `print(list(reader))`, does that _also_ have double quotes? Or possibly two layers of alternating quotes, ex. `'"status"'`? I suspect the problem is with the csv file, not the json. – Kevin Jan 31 '19 at 19:24
  • @Kevin Could be right about .csv file. Here's the header and the first record: "Test Name", "Status", "Test Start Time", "Test End Time", "Throughput", "Jitter", "Packet Loss %", "Compute/VM", "Statistics Sampling Start Time", "Statistics Sampling End Time", "Min CPU", "Max CPU", "Avg CPU", "Avg Memory (MB)" "Basic Tcp IPerf Test",PASS,20190103 08:07:41.662,20190103 08:08:44.051,2095.2746,,,"SendVm (x.x.x.x)",03 Jan 2019 08:07:44,03 Jan 2019 08:08:42,0,2,1,7758 – timsterc Jan 31 '19 at 19:34
  • @Kevin Also, print(list(reader)) also has double quotes though no escape backslashes: (' "Status"', 'PASS'), (' "Test Start Time"', '20190103 08:07:41.662') – timsterc Jan 31 '19 at 19:38
  • If you put `print(list(reader))` on the line after `jsonfile.write(json.dumps(list(reader)))`, I would expect the output to be blank. Readers can only iterate over the file once. Calling list on it a second time will give only an empty list, since the pointer is at the end of the file. I meant that you should replace `jsonfile.write(json.dumps(list(reader)))` entirely with `print(list(reader))`. – Kevin Jan 31 '19 at 19:40
  • @Kevin yep, caught that, see my edit – timsterc Jan 31 '19 at 19:41

1 Answers1

2

It looks like your csv file is formatted in a way that confuses the default csv parser. In particular, your columns are separated by both a comma and a space:

           comma
           v
"Test Name", "Status"
            ^
            space

This makes the parser assume that all the text following the comma is part of the value, including the space and the quote mark.

One possible solution is to specify a value for the skipinitialspace argument when opening the DictReader.

import csv
import json

csvfile = open('fiile.csv','r')
jsonfile = open('file.json','w')

reader = csv.DictReader(csvfile, skipinitialspace=True)
jsonfile.write(json.dumps(list(reader)))

Then the parser should understand that the space and quote aren't part of the value. Result:

[{"Test Name": "Basic Tcp IPerf Test", "Status": "PASS", "Test Start Time": "20190103 08:07:41.662", "Test End Time": "20190103 08:08:44.051", "Throughput": "2095.2746", "Jitter": "", "Packet Loss %": "", "Compute/VM": "SendVm (x.x.x.x)", "Statistics Sampling Start Time": "03 Jan 2019 08:07:44", "Statistics Sampling End Time": "03 Jan 2019 08:08:42", "Min CPU": "0", "Max CPU": "2", "Avg CPU": "1", "Avg Memory (MB)": "7758"}]
Kevin
  • 74,910
  • 12
  • 133
  • 166