python list to dict conversion confusion

Question

In reference to my question here, I have managed to develop a structure of lists in below given format:

(hours,color,type,text)

[('1', '2', 'a', '564'),
('1', '3', 'b', '570'),
('1', '4', 'c', '570'),
('5', '6', 'a', '560'),
('5', '7', 'b', '570'),
('5', '8', 'c', '580'),
('9', '10', 'a', '560'),
('9', '11', 'b', '570'),
('9', '12', 'c', '580')]

I already referred here but not able to get rid of all 1's, 5's and 9's.

What I want Now for comparing both files as given in link above, I want to make a dictionary structure like mentioned below and then compare dictionary's content individually.

{'1':[2,3,4,a,b,c,564,570,570], 
 '5':[6,7,8,a,b,c,560,570,580]
 '9':[10,11,12,a,b,c,560,570,580]}

Since the data is huge in both files, I cannot compare simply line wise using loop. So I decided making a specific dictionary for each 'hour' attribute of 'location' element which includes all 'feature'. I am trying to think from long time but not able to start off. Can you help?

To prevent complexity in viewing I did not pasted original xml code from link above.

so as the answer in your previous question ,did you make the tree structure? You are still using same list approach i guess — rahul tyagi, Jul 01 '15 at 16:52
I thought about it but i thought this approach will be bit efficient @rahul — Radheya, Jul 01 '15 at 16:55
@SuperBiasedMan I want to derive dictionary structure from the list structure above in the format mentioned — Radheya, Jul 01 '15 at 16:56
Have you looked into pandas? Specifically, convert your dictionary key into a dataframe index and the other data to columnar fields. Pandas is a data library that's been vectorized and implemented in C for significant efficiency gains. — AZhao, Jul 01 '15 at 16:59
Why is 9 in the output list for '9' but neither 1 nor 5 are in those lists? — DSM, Jul 01 '15 at 17:13
A "bit" of efficiency probably won't matter much, and I doubt a flat structure would be any more efficient than a tree anyway. Optimize for usability first. — TigerhawkT3, Jul 01 '15 at 17:24
@AZhao I never heard of it but will definitely look into it. Thanks for suggestion — Radheya, Jul 01 '15 at 17:55
@DSM Thanks for pointing that out. I edited my output in question. — Radheya, Jul 01 '15 at 17:57

martineau · Accepted Answer · 2015-07-02T21:00:33.193

You could create the dictionary in two steps. The first step groups tuples together based on the first value in each, and then in the second step, the now grouped items are flattened into a single list. It's written to work with tuples containing two or more items, but the exact number doesn't matter.

from collections import defaultdict
from itertools import chain
from pprint import pprint

tuples = [('1', '2', 'a', '564'),
          ('1', '3', 'b', '570'),
          ('1', '4', 'c', '570'),
          ('5', '6', 'a', '560'),
          ('5', '7', 'b', '570'),
          ('5', '8', 'c', '580'),
          ('9', '10', 'a', '560'),
          ('9', '11', 'b', '570'),
          ('9', '12', 'c', '580')]

d = defaultdict(list)

for tuple in tuples:
    d[tuple[0]].append(tuple[1:])

for k,v in d.items():
    d[k] = list(chain.from_iterable(zip(*v)))

pprint(d)

Output:

{'1': ['2', '3', '4', 'a', 'b', 'c', '564', '570', '570'],
 '5': ['6', '7', '8', 'a', 'b', 'c', '560', '570', '580'],
 '9': ['10', '11', '12', 'a', 'b', 'c', '560', '570', '580']}

score 1 · Answer 2 · answered Jul 01 '15 at 17:14

First, loop through the list of tuples to build a dictionary from tuple element 0 to the remaining elements. This produces a dictionary whose keys are element 0 of each tuple and whose values are a list of tuples, each tuple representing one row with the same element 0. Then flatten each of these lists column-wise using itertools.chain and itertools.izip.

Python 2.7 solution:

#!/usr/bin/env python
from __future__ import print_function
from itertools import chain, izip

data = [
    ('1', '2', 'a', '564'),
    ('1', '3', 'b', '570'),
    ('1', '4', 'c', '570'),
    ('5', '6', 'a', '560'),
    ('5', '7', 'b', '570'),
    ('5', '8', 'c', '580'),
    ('9', '10', 'a', '560'),
    ('9', '11', 'b', '570'),
    ('9', '12', 'c', '580')
]

# First, sort the values in rows into lists by their first element.
step1 = {}
for row in data:
    step1.setdefault(row[0], [])
    step1[row[0]].append(row[1:])

print("Step 1:")
print(repr(step1))

# Now to flatten a sequence-of-sequences column-wise,
# use list(itertools.chain(*itertools.izip(*seq)))
step2 = dict((k, list(chain(*izip(*v))))
             for k, v in step1.iteritems())

print("Step 2:")
print(repr(step2))

Result:

Step 1:
{'1': [('2', 'a', '564'), ('3', 'b', '570'), ('4', 'c', '570')],
 '9': [('10', 'a', '560'), ('11', 'b', '570'), ('12', 'c', '580')],
 '5': [('6', 'a', '560'), ('7', 'b', '570'), ('8', 'c', '580')]}
Step 2:
{'1': ['2', '3', '4', 'a', 'b', 'c', '564', '570', '570'],
 '9': ['10', '11', '12', 'a', 'b', 'c', '560', '570', '580'],
 '5': ['6', '7', '8', 'a', 'b', 'c', '560', '570', '580']}

score 1 · Answer 3 · answered Jul 01 '15 at 17:15

You can use itertools.groupby to group the elements at the 0th index of the tuples and then loop over them to create your dictionary.

Example -

>>> from itertools import groupby
>>> l = [('1', '2', 'a', '564'),
... ('1', '3', 'b', '570'),
... ('1', '4', 'c', '570'),
... ('5', '6', 'a', '560'),
... ('5', '7', 'b', '570'),
... ('5', '8', 'c', '580'),
... ('9', '10', 'a', '560'),
... ('9', '11', 'b', '570'),
... ('9', '12', 'c', '580')]
>>> x = groupby(l, key = lambda x: x[0])
>>> d = {}
>>> for y, z in x:
...     l1 = []
...     l2 = []
...     l3 = []
...     for a in z:
...             l1.append(a[1])
...             l2.append(a[2])
...             l3.append(a[3])
...     l1.extend(l2)
...     l1.extend(l3)
...     d[y] = l1
>>> d
{'5': ['6', '7', '8', 'a', 'b', 'c', '560', '570', '580'], '9': ['10', '11', '12', 'a', 'b', 'c', '560', '570', '580'], '1': ['2', '3', '4', 'a', 'b', 'c', '564', '570', '570']}

Thank you all for your answers. That was very helpful. thanks a ton! — Radheya, Jul 01 '15 at 17:53

score 1 · Answer 4 · answered Jul 01 '15 at 17:31

Here is another approach

import itertools

data = [('1', '2', 'a', '564'),
('1', '3', 'b', '570'),
('1', '4', 'c', '570'),
('5', '6', 'a', '560'),
('5', '7', 'b', '570'),
('5', '8', 'c', '580'),
('9', '10', 'a', '560'),
('9', '11', 'b', '570'),
('9', '12', 'c', '580')] 

ddata = {}

for hour, color, type, text in data:
    lcontent = ddata.setdefault(hour, [[],[],[]])
    lcontent[0].append(color)
    lcontent[1].append(type)
    lcontent[2].append(text)

ddata = {hour: list(itertools.chain.from_iterable(content)) for (hour, content) in ddata.iteritems()}

print ddata

After the for loop the dictionary will be in the following form, which may actually be in a more useful format than the one you requested:

{'1': [['2', '3', '4'], ['a', 'b', 'c'], ['564', '570', '570']], '9': [['10', '11', '12'], ['a', 'b', 'c'], ['560', '570', '580']], '5': [['6', '7', '8'], ['a', 'b', 'c'], ['560', '570', '580']]}

I then apply a dictionary comprehension to flatten the list entries to the format you specified.

{'1': ['2', '3', '4', 'a', 'b', 'c', '564', '570', '570'], '9': ['10', '11', '12', 'a', 'b', 'c', '560', '570', '580'], '5': ['6', '7', '8', 'a', 'b', 'c', '560', '570', '580']}

Python 2.7 soluton

python list to dict conversion confusion

4 Answers4