Remove duplicate dictionaries and order values corresponding to keys in a list

Question

This is a follow-up question to a previous issue I had: Recursively accessing paths and values of a nested dictionary

Consider this nonsensical JSON file called sample_dict: https://jsoneditoronline.org/?id=da7a486dc2e24bf8b94add9f04c71b4d

Given the code here:

import json
import csv

json_sample = 'sample_dict.json'
json_file = open(json_sample, 'r')
json_data = json.load(json_file)

csv_file = open('sample_dict.csv', 'w')

items = json_data['sample_dict']

# Thanks @fferri!
def visit_dict(d, path=[]):
    for k, v in d.items():
        if not isinstance(v, dict):
            yield path + [k], v
        else:
            for visits in visit_dict(v, path + [k]):
                yield visits

for key in items:
    csv_file.write(','.join('/'.join(k) for k, v in visit_dict(key)))

csv_file.write('\n')

for value in items:
    csv_file.write(','.join(str(v) for k, v in visit_dict(value)))

Here, it prints out the 2 dictionaries from the list, including the duplicates. The issues in question are:

We don't want duplicates, but include all keys and values from the parent dictionaries since they may not exist in other dictionaries
Values are printed out of order and not in rows, not matching the keys of the column headers

The ideal output would be something like:

dict_id person  person/person_id    person/name person/age  family  family/person_id    family/members  family/members/father   family/members/mother   family/members/son  family/family_id    color   items_id    furniture   furniture/type  furniture/color furniture/size  furniture/purchases
5   None    15  Martin  18  None    20      Jose    Maddie  Jerry   2   Red None    None    Chair   Brown   Large   []
10  None    20  Zeeshan 25  None    None    None    None    None    None    None    None    None    Table   Blue    Blue    None    []

Excuse the bad formatting, but each value in each row should correspond to each column header.

figbeam · Answer 1 · 2018-04-30T00:54:05.797

0

I haven't worked much with json or dicts but I know that a dict is not ordered. If you want an order you'll have to move the key/value pairs to something sortable and then sort them.

I'm not really sure what you mean by "duplicates"

As you are using csv to print, the output will be comma separated. If you want the values printed under the column headers you will have to get the length of each header and of each value, set cell length to the greater of them and add padding to the shorter string before writing to file.

Duplicates

Ok, I get it. I haven't tried to do this with a list comprehensiion but it's easy with loops:

key_list = []
for key in items:
    for k, v in visit_dict(key):
        if k not in key_list: key_list.append(k)

Then you can loop over the key list to get values from each key in items.

edited Apr 30 '18 at 00:54

answered Apr 29 '18 at 23:42

figbeam

7,001
2
12
18

The keys `dict_id`, `person`, `person/person_id`, and so forth are duplicates because they're being iterated over the `json_data['sample_dict']` list, when there are 2 dictionaries in the list. We only want one of the dictionaries to serve as universal column headers. – cheebdragonite Apr 30 '18 at 00:09
Some more about duplicates. – figbeam Apr 30 '18 at 00:54

Remove duplicate dictionaries and order values corresponding to keys in a list

1 Answers1