2

In reference to my question here, I have managed to develop a structure of lists in below given format:

(hours,color,type,text)

[('1', '2', 'a', '564'),
('1', '3', 'b', '570'),
('1', '4', 'c', '570'),
('5', '6', 'a', '560'),
('5', '7', 'b', '570'),
('5', '8', 'c', '580'),
('9', '10', 'a', '560'),
('9', '11', 'b', '570'),
('9', '12', 'c', '580')] 

I already referred here but not able to get rid of all 1's, 5's and 9's.

What I want Now for comparing both files as given in link above, I want to make a dictionary structure like mentioned below and then compare dictionary's content individually.

{'1':[2,3,4,a,b,c,564,570,570], 
 '5':[6,7,8,a,b,c,560,570,580]
 '9':[10,11,12,a,b,c,560,570,580]}

Since the data is huge in both files, I cannot compare simply line wise using loop. So I decided making a specific dictionary for each 'hour' attribute of 'location' element which includes all 'feature'. I am trying to think from long time but not able to start off. Can you help?

To prevent complexity in viewing I did not pasted original xml code from link above.

Community
  • 1
  • 1
Radheya
  • 779
  • 1
  • 11
  • 41
  • so as the answer in your previous question ,did you make the tree structure? You are still using same list approach i guess – rahul tyagi Jul 01 '15 at 16:52
  • What's the actual problem you're having? – SuperBiasedMan Jul 01 '15 at 16:53
  • I thought about it but i thought this approach will be bit efficient @rahul – Radheya Jul 01 '15 at 16:55
  • @SuperBiasedMan I want to derive dictionary structure from the list structure above in the format mentioned – Radheya Jul 01 '15 at 16:56
  • Have you looked into pandas? Specifically, convert your dictionary key into a dataframe index and the other data to columnar fields. Pandas is a data library that's been vectorized and implemented in C for significant efficiency gains. – AZhao Jul 01 '15 at 16:59
  • Why is 9 in the output list for '9' but neither 1 nor 5 are in those lists? – DSM Jul 01 '15 at 17:13
  • A "bit" of efficiency probably won't matter much, and I doubt a flat structure would be any more efficient than a tree anyway. Optimize for usability first. – TigerhawkT3 Jul 01 '15 at 17:24
  • @AZhao I never heard of it but will definitely look into it. Thanks for suggestion – Radheya Jul 01 '15 at 17:55
  • @DSM Thanks for pointing that out. I edited my output in question. – Radheya Jul 01 '15 at 17:57

4 Answers4

3

You could create the dictionary in two steps. The first step groups tuples together based on the first value in each, and then in the second step, the now grouped items are flattened into a single list. It's written to work with tuples containing two or more items, but the exact number doesn't matter.

from collections import defaultdict
from itertools import chain
from pprint import pprint

tuples = [('1', '2', 'a', '564'),
          ('1', '3', 'b', '570'),
          ('1', '4', 'c', '570'),
          ('5', '6', 'a', '560'),
          ('5', '7', 'b', '570'),
          ('5', '8', 'c', '580'),
          ('9', '10', 'a', '560'),
          ('9', '11', 'b', '570'),
          ('9', '12', 'c', '580')]

d = defaultdict(list)

for tuple in tuples:
    d[tuple[0]].append(tuple[1:])

for k,v in d.items():
    d[k] = list(chain.from_iterable(zip(*v)))

pprint(d)

Output:

{'1': ['2', '3', '4', 'a', 'b', 'c', '564', '570', '570'],
 '5': ['6', '7', '8', 'a', 'b', 'c', '560', '570', '580'],
 '9': ['10', '11', '12', 'a', 'b', 'c', '560', '570', '580']}
martineau
  • 119,623
  • 25
  • 170
  • 301
1

First, loop through the list of tuples to build a dictionary from tuple element 0 to the remaining elements. This produces a dictionary whose keys are element 0 of each tuple and whose values are a list of tuples, each tuple representing one row with the same element 0. Then flatten each of these lists column-wise using itertools.chain and itertools.izip.

Python 2.7 solution:

#!/usr/bin/env python
from __future__ import print_function
from itertools import chain, izip

data = [
    ('1', '2', 'a', '564'),
    ('1', '3', 'b', '570'),
    ('1', '4', 'c', '570'),
    ('5', '6', 'a', '560'),
    ('5', '7', 'b', '570'),
    ('5', '8', 'c', '580'),
    ('9', '10', 'a', '560'),
    ('9', '11', 'b', '570'),
    ('9', '12', 'c', '580')
]

# First, sort the values in rows into lists by their first element.
step1 = {}
for row in data:
    step1.setdefault(row[0], [])
    step1[row[0]].append(row[1:])

print("Step 1:")
print(repr(step1))

# Now to flatten a sequence-of-sequences column-wise,
# use list(itertools.chain(*itertools.izip(*seq)))
step2 = dict((k, list(chain(*izip(*v))))
             for k, v in step1.iteritems())

print("Step 2:")
print(repr(step2))

Result:

Step 1:
{'1': [('2', 'a', '564'), ('3', 'b', '570'), ('4', 'c', '570')],
 '9': [('10', 'a', '560'), ('11', 'b', '570'), ('12', 'c', '580')],
 '5': [('6', 'a', '560'), ('7', 'b', '570'), ('8', 'c', '580')]}
Step 2:
{'1': ['2', '3', '4', 'a', 'b', 'c', '564', '570', '570'],
 '9': ['10', '11', '12', 'a', 'b', 'c', '560', '570', '580'],
 '5': ['6', '7', '8', 'a', 'b', 'c', '560', '570', '580']}
Damian Yerrick
  • 4,602
  • 2
  • 26
  • 64
1

You can use itertools.groupby to group the elements at the 0th index of the tuples and then loop over them to create your dictionary.

Example -

>>> from itertools import groupby
>>> l = [('1', '2', 'a', '564'),
... ('1', '3', 'b', '570'),
... ('1', '4', 'c', '570'),
... ('5', '6', 'a', '560'),
... ('5', '7', 'b', '570'),
... ('5', '8', 'c', '580'),
... ('9', '10', 'a', '560'),
... ('9', '11', 'b', '570'),
... ('9', '12', 'c', '580')]
>>> x = groupby(l, key = lambda x: x[0])
>>> d = {}
>>> for y, z in x:
...     l1 = []
...     l2 = []
...     l3 = []
...     for a in z:
...             l1.append(a[1])
...             l2.append(a[2])
...             l3.append(a[3])
...     l1.extend(l2)
...     l1.extend(l3)
...     d[y] = l1
>>> d
{'5': ['6', '7', '8', 'a', 'b', 'c', '560', '570', '580'], '9': ['10', '11', '12', 'a', 'b', 'c', '560', '570', '580'], '1': ['2', '3', '4', 'a', 'b', 'c', '564', '570', '570']}
Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
1

Here is another approach

import itertools

data = [('1', '2', 'a', '564'),
('1', '3', 'b', '570'),
('1', '4', 'c', '570'),
('5', '6', 'a', '560'),
('5', '7', 'b', '570'),
('5', '8', 'c', '580'),
('9', '10', 'a', '560'),
('9', '11', 'b', '570'),
('9', '12', 'c', '580')] 

ddata = {}

for hour, color, type, text in data:
    lcontent = ddata.setdefault(hour, [[],[],[]])
    lcontent[0].append(color)
    lcontent[1].append(type)
    lcontent[2].append(text)

ddata = {hour: list(itertools.chain.from_iterable(content)) for (hour, content) in ddata.iteritems()}

print ddata

After the for loop the dictionary will be in the following form, which may actually be in a more useful format than the one you requested:

{'1': [['2', '3', '4'], ['a', 'b', 'c'], ['564', '570', '570']], '9': [['10', '11', '12'], ['a', 'b', 'c'], ['560', '570', '580']], '5': [['6', '7', '8'], ['a', 'b', 'c'], ['560', '570', '580']]}

I then apply a dictionary comprehension to flatten the list entries to the format you specified.

{'1': ['2', '3', '4', 'a', 'b', 'c', '564', '570', '570'], '9': ['10', '11', '12', 'a', 'b', 'c', '560', '570', '580'], '5': ['6', '7', '8', 'a', 'b', 'c', '560', '570', '580']}

Python 2.7 soluton

Martin Evans
  • 45,791
  • 17
  • 81
  • 97