Python list of tuples to dict

Question

There is a python list

[('schema1', 'table1', 'column_name1'), ('schema1', 'table1', 'column_name2'), ('schema1', 'table2', 'column_name3'), ('schema2', 'table3', 'column_name4')]

I need to convert it into python dict with the following structure

schema1:
            table1:
                    (column_name1,
                    column_name2)
            table2:
                    (column_name3)
schema2:
            table3:
                    (column_name4)

Are there any efficient ways to convert?

Post some code showing what you have tried so far. – killian95 Oct 24 '18 at 17:08 — killian95, Oct 24 '18 at 17:08

score 6 · Answer 1 · answered Oct 24 '18 at 17:09

6

Sure. Use collections.defaultdict:

from collections import defaultdict

dd = defaultdict(lambda: defaultdict(list))

for schema, table, colname in L:
    dd[schema][table].append(colname)

Result:

defaultdict(<function __main__.<lambda>>,
            {'schema1': defaultdict(list,
                         {'table1': ['column_name1', 'column_name2'],
                          'table2': ['column_name3']}),
             'schema2': defaultdict(list, {'table3': ['column_name4']})})

answered Oct 24 '18 at 17:09

jpp

159,742
34
281
339

1

this is rather elegant, i have not used this particular pattern before – Chris_Rands Oct 24 '18 at 17:12
1

@Chris_Rands, I sense sarcasm :) – jpp Oct 24 '18 at 17:13
1

No I don't think @Chris_Rands knew this was a duplicate ;-) – cs95 Oct 24 '18 at 17:14

timgeb · Accepted Answer · 2018-10-24T17:16:58.130

I'd do this with a defaultdict that produces defaultdict(list) instances as default values.

Demo

>>> from collections import defaultdict
>>> 
>>> d = defaultdict(lambda: defaultdict(list))
>>> data = [('schema1', 'table1', 'column_name1'), ('schema1', 'table1', 'column_name2'), ('schema1', 'table2', 'column_name3'), ('schema2', 'table3', 'column_name4')]
>>> 
>>> for k1, k2, v in data:
...:    d[k1][k2].append(v)
...:    
>>> d
>>> 
defaultdict(<function __main__.<lambda>()>,
            {'schema1': defaultdict(list,
                         {'table1': ['column_name1', 'column_name2'],
                          'table2': ['column_name3']}),
             'schema2': defaultdict(list, {'table3': ['column_name4']})})

To match your desired output exactly (although I don't see much reason), build a regular dictionary from d with tuple values.

>>> d = {k1:{k2:tuple(v2) for k2, v2 in v1.items()} for k1, v1 in d.items()}
>>> d
>>> 
{'schema1': {'table1': ('column_name1', 'column_name2'),
  'table2': ('column_name3',)},
 'schema2': {'table3': ('column_name4',)}}

Explanation

The defaultdict initializer accepts a callable (in this example an anonymous lambda function is used). Whenever a key is missing, that callable is called and the return value is used as a fallback-value.

The line

d = defaultdict(lambda: defaultdict(list))

is creating a defaultdict which creates another defaultdict when a key is missing. The second defaultdict creates a list when a key is missing.

>>> d = defaultdict(lambda: defaultdict(list))
>>> d['bogus']
>>> defaultdict(list, {})
>>> d['hokus']['pokus']
>>> []

Patrick Artner · Answer 3 · 2018-10-24T17:17:31.470

3

No need for any special things, simple dictionary methods work:

d = [('schema1', 'table1', 'column_name1'), 
     ('schema1', 'table1', 'column_name2'), 
     ('schema1', 'table2', 'column_name3'), 
     ('schema2', 'table3', 'column_name4')]

k = {}

for schema,table,column in d:
    p =  k.setdefault(schema,{})
    p2 = p.setdefault(table,[])
    p2.append(column)

print(k)

Output:

{'schema1': {'table2': ['column_name3'], 
             'table1': ['column_name1', 'column_name2']}, 
 'schema2': {'table3': ['column_name4']}}

Although more experienced ppl suggest not using this, because its slower. So it is better to work with the defaultdict-approach provided by the other answers.

edited Oct 24 '18 at 17:17

answered Oct 24 '18 at 17:11

Patrick Artner

50,409
9
43
69

1

what's wrong with an `import` of `defaultdict`? it will likely be more performant too – Chris_Rands Oct 24 '18 at 17:12
@Chris_Rands nothing is wrong with defaultdicts - but 2 other answers already used those - and its not needed. As for performace - I dont know which would be faster/better. One would have to test that. – Patrick Artner Oct 24 '18 at 17:14
2

Re:Performance, `setdefault` is slower because the default is _always_ created, even if the key exists (even though it isn't used in that situation). In the case of a defaultdict, it's better. – cs95 Oct 24 '18 at 17:15

score 1 · Answer 4 · answered Oct 24 '18 at 17:13

l = [('schema1', 'table1', 'column_name1'), ('schema1', 'table1', 'column_name2'), 
 ('schema1', 'table2', 'column_name3'), ('schema2', 'table3', 'column_name4')]
d = {}

for s, t, c in l:
    d[s] = d.get(s, {})
    d[s][t] = d[s].get(t, tuple()) + (c,)
print(d)

Out put:

{'schema1': {'table1': ('column_name1', 'column_name2'), 
             'table2': ('column_name3',)}, 
 'schema2': {'table3': ('column_name4',)}}

Python list of tuples to dict

4 Answers4