Data extraction: Creating dictionary of dictionaries with lists in python

Question

I have data similar to the following in a file:

Name, Age, Sex, School, height, weight, id

Joe, 10, M, StThomas, 120, 20, 111

Jim, 9, M, StThomas, 126, 22, 123

Jack, 8, M, StFrancis, 110, 15, 145

Abel, 10, F, StFrancis, 128, 23, 166

The actual data might be 100 columns and a million rows.

What I am trying to do is create a dict in the following pattern:

school_data = {'StThomas': {'weight':[20,22], 'height': [120,126]},
               'StFrancis': {'weight':[15,23], 'height': [110,128]} }

Things I tried:

Trial 1: (very expensive in terms of computation)

school_names  = []
for lines in read_data[1:]:
    data = lines.split('\t')
    school_names.append(data[3])

school_names = set(school_names)

for lines in read_data[1:]:
    for school in schools:
        if school in lines:
            print lines

Trial 2:

for lines in read_data[1:]:
    data = lines.split('\t')
    school_name = data[3]
    height = data[4]
    weight = data[5]
    id = data [6]
    x[id] = {school_name: (weight, height)}

The above two are methods in which I tried to proceed but did not get closer to the solution.

What are the other columns? Are they relevant for the computation or are you looking to do with those extra columns the same thing that you did with weight/height (group values by School)? — Cadu, Oct 07 '16 at 14:04

score 1 · Answer 1 · edited May 23 '17 at 12:16

The easiest way to do this within the standard library is using existing tools, csv.DictReader and collections.defaultdict:

from collections import defaultdict
from csv import DictReader

data = defaultdict(lambda: defaultdict(list))  # *

with open(datafile) as file_:
    for row in DictReader(file_):
        data[row[' School'].strip()]['height'].append(int(row[' height']))
        data[row[' School'].strip()]['weight'].append(int(row[' weight']))

Note that the spaces in e.g. ' School' and the .strip() are necessary because of the spaces in the header row of your input file. Result:

>>> data
defaultdict(<function <lambda> at 0x10261c0c8>, {'StFrancis': defaultdict(<type 'list'>, {'weight': [15, 23], 'height': [110, 128]}), 'StThomas': defaultdict(<type 'list'>, {'weight': [20, 22], 'height': [120, 126]})})
>>> data['StThomas']['height']
[120, 126]

Alternatively, if you're planning to do further analysis, look into something like pandas and its DataFrame data structure.

* see Python defaultdict and lambda if this seems weird

Data extraction: Creating dictionary of dictionaries with lists in python

1 Answers1