0

I have a list of json file like below:

[
{"A":{"value":1}, "B":{"value":2}},
{"A":{"value":9}, "B":{"value":3}}
]

Which I want to turn to csv like so:

A.value,B.value
1,2
9,3

The issue is that I have nested keys which have the same name : value but should be in a separate column. I could not find an elegant solution to this anywhere yet. I would like to be able to do something like:

data = json.load(open(file, 'r'))
with open("output.csv", "w") as f:

        columns = ["A.value","B.value"]
        cw = csv.DictWriter(f, columns)
        cw.writeheader()
        cw.writerows(data)

Which I know would work if I did not have any nested keys. I found other questions similar to this but I don't think this applies to my situation.

As an extra challenge:

I'd rather keep a generic approach. Later I might have a list of jsons like:

  [
    {"A":{"value":1}, "B":{"value":2}, "key":90},
    {"A":{"value":9}, "B":{"value":3}, "key":91}
    ]

Meaning not all keys I want to add to csv will have a nested value key!

**output ^ **

 A.value,B.value,key
    1,2,90
    9,3,91
buran
  • 13,682
  • 10
  • 36
  • 61
bcsta
  • 1,963
  • 3
  • 22
  • 61
  • 1
    I'd suggest starting by flattening the dictionaries; that's a pretty straightforward problem that has nothing to do with the CSV aspect, and once you've done that you can just use `csv.DictWriter` in the normal way. – Samwise Feb 13 '22 at 17:12

3 Answers3

0

This should do the job:

cw.writerows([{f'{key}.value':val['value'] for key, val in row.items()} for row in data])

or as regular loop:

for row in data:
    cw.writerow({f'{key}.value':val['value'] for key, val in row.items()})

EDIT

import csv

data = [
    {"A":{"value":1}, "B":{"value":2}, "key":90},
    {"A":{"value":9}, "B":{"value":3}, "key":91}
    ]


def parse(row):
    for key, value in row.items():
        try:
            yield f'{key}.value', value['value']
        except TypeError:
            yield key, value


with open("output.csv", "w") as f:
    columns =  ['A.value', 'B.value', 'key']
    cw = csv.DictWriter(f, columns)
    cw.writeheader()
    cw.writerows(dict(parse(row)) for row in data)

The only thing I don't like is the hardcoded headers

buran
  • 13,682
  • 10
  • 36
  • 61
0

Flattening the dicts worked. Since there is a list of dicts, the flattening has to be done for each dict:

import collections

def flatten(d, parent_key='', sep='.'):
    items = []
    for k, v in d.items():
        new_key = parent_key + sep + k if parent_key else k
        if isinstance(v, collections.MutableMapping):
            items.extend(flatten(v, new_key, sep=sep).items())
        else:
            items.append((new_key, v))
    return dict(items)


def flatten_json_list(list):
    flattened_list = []
    for d in list:
        flattened_list.append(flatten(d))

    return flattened_list

Then the code should work as implemented in the question:

with open("out.csv","w") as f:

     columns = ["A.value","B.value","key"]
     cw = csv.DictWriter(f, columns)
     cw.writeheader()
     cw.writerows(data)
bcsta
  • 1,963
  • 3
  • 22
  • 61
  • 1
    Please, note, you should be importing/using `collections.abc`, not `collections`. There is Depreciation Warning: _DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working_ – buran Feb 13 '22 at 17:54
0

jsonpath-ng can parse even such a nested json object very easily. It can be installed by the following command:

pip install --upgrade jsonpath-ng

Code:

from collections import defaultdict
import jsonpath_ng as jp
import pandas as pd
import re

jp.jsonpath.auto_id_field = 'json_path'

def json_to_df(json):
    expr = jp.parse(f'$..*.{jp.jsonpath.auto_id_field}')
    d = defaultdict(list)
    for m in expr.find(json):
        if not isinstance(m.datum.value, (dict, list)):
            d[re.sub(r'\[\d+]\.', '', m.value)].append(m.datum.value)
    return pd.DataFrame(d)

data = [{"A":{"value":1}, "B":{"value":2}, "key":90},
        {"A":{"value":9}, "B":{"value":3}, "key":91}]

df = json_to_df(data)
df.to_csv('output.csv', index=False)

Output:

key A.value B.value
90 1 2
91 9 3

Another complicated example:

data = [{"A":{"value":1}, "B":{"value":2, "C":{"value":6}}, "key":90},
        {"A":{"value":9}, "B":{"value":3, "C":{"value":8}}, "key":91}]
json_to_df(data).to_csv('output.csv', index=False)
key A.value B.value B.C.value
90 1 2 6
91 9 3 8
quasi-human
  • 1,898
  • 1
  • 2
  • 13