0

I'm struggling with csv import to nested dictionary. I found a example thats almost perfect for me:

UID,BID,R
U1,B1,4
U1,B2,3
U2,B1,2

import csv

new_data_dict = {}
with open("data.csv", 'r') as data_file:
    data = csv.DictReader(data_file, delimiter=",")
    for row in data:
        item = new_data_dict.get(row["UID"], dict())
        item[row["BID"]] = int(row["R"])

        new_data_dict[row["UID"]] = item

print new_data_dict

in my case I have one level of nesting more to do. my data looks like:

FID,UID,BID,R
A1,U1,B1,4
A1,U1,B2,3
A1,U2,B1,2
A2,U1,B1,4
A2,U1,B2,3
A2,U2,B1,2

Result should be:

{"A1":{"U1":{"B1":4, "B2": 3}, "U2":{"B1":2}},
 "A2":{"U1":{"B1":4, "B2": 3}, "U2":{"B1":2}}}

How would I have to complete and correct the code posted above?

Thx, Toby

canedha
  • 53
  • 8

2 Answers2

2

using a collections.defaultdict that defines itself as a default dictionary recursively, it's very easy to nest the levels.

This self-contained example (which is not using a file but a list of lines) demonstrates it:

import collections
import csv,json

data_file="""FID,UID,BID,R
A1,U1,B1,4
A1,U1,B2,3
A1,U2,B1,2
A2,U1,B1,4
A2,U1,B2,3
A2,U2,B1,2
""".splitlines()

def nesteddict():
    return collections.defaultdict(nesteddict)

new_data_dict = nesteddict()

data = csv.DictReader(data_file, delimiter=",")
for row in data:
    new_data_dict[row["FID"]][row["UID"]][row["BID"]] = row["R"]

# dump as json to have a clean, indented representation
print(json.dumps(new_data_dict,indent=2))

result:

{
  "A1": {
    "U1": {
      "B1": "4",
      "B2": "3"
    },
    "U2": {
      "B1": "2"
    }
  },
  "A2": {
    "U1": {
      "B1": "4",
      "B2": "3"
    },
    "U2": {
      "B1": "2"
    }
  }
}

the "magic" line is this:

def nesteddict():
    return collections.defaultdict(nesteddict)

each time a key is missing in the dictionary nesteddict is called, which creates a default dictionary with the same properties (saw that in an old StackOverflow answer: Nested defaultdict of defaultdict)

then creating the levels or updating them is done with just:

new_data_dict[row["FID"]][row["UID"]][row["BID"]] = row["R"]
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • 1
    thanks for your help!!! works perfectly fine. :) what I don't understand is how the nesteddict is called when a key is missing as it is not in the loop part? or do you mean if I run the program with different input files? – canedha Nov 22 '20 at 11:19
1

If you're going simple, you can try this:

import csv

new_data_dict = {}
with open("data.csv", "r") as data_file:
    data = csv.DictReader(data_file, delimiter=",")
    for row in data:
        if row["R"] != "R":
            item = new_data_dict.get(row["UID"], dict())
            item[row["BID"]] = int(row["R"])
        
            temp_dict = new_data_dict.get(row["FID"], dict())
            if row["UID"] in temp_dict:
                temp_dict[row["UID"]].update(item)
            else:
                temp_dict[row["UID"]] = item
        
            new_data_dict[row["FID"]] = temp_dict

print new_data_dict

I just added a new dictionary called temp_dict before the assignment to new_data so that previous values can be maintained.

Result:

{'A1': {'U1': {'B1': 4, 'B2': 3}, 'U2': {'B1': 2}}, 'A2': {'U1': {'B1': 4, 'B2': 3}, 'U2': {'B1': 2}}}
Camilo Martinez M.
  • 1,420
  • 1
  • 7
  • 21
  • 1
    thanks for your help! would love to chose both as correct answers as both are working absolutely fine. I have chosen the other one as it is the slightly more elegant version. – canedha Nov 22 '20 at 11:18