-1

I have the following list of dicts:

[
   {"taskid": 1, "type": "input", "name": "First_in"},
   {"taskid": 1, "type": "input", "name": "Second_in"},
   {"taskid": 1, "type": "input", "name": "Third_in"},
   {"taskid": 1, "type": "output", "name": "First_out"},
   {"taskid": 1, "type": "output", "name": "Second_out"},
   {"taskid": 1, "type": "output", "name": "Third_out"},
   {"taskid": 2, "type": "input", "name": "First_in"},
   {"taskid": 2, "type": "output", "name": "First_out"},
   {"taskid": 2, "type": "output", "name": "Second_out"},
...]

And I need to restructure it to obtain the following result:

[
   {"taskid": 1, 
    "input": ["First_in", "Second_in", "Third_in"], 
    "output": ["First_out", "Second_out", "Third_out"]
   },
   {"taskid": 2, 
    "input": ["First_in"], 
    "output": ["First_out","Second_out"]
   },
...]

Here is my code for this:

def squash_records(rec):
    squashed = []
    # get all taskids
    tasks = []
    for item in rec:
        tasks.append(item['taskid'])
    for task in tasks:
        current_task = {}
        current_task['taskid'] = task
        current_task['input'] = [row['name'] for row in rec if row['type'] == 'input' and row['taskid'] == task]
        current_task['output'] = [row['name'] for row in rec if row['type'] == 'output' and row['taskid'] == task]
        squashed.append(current_task)
    return squashed

Which is the best way to implement it if this array is a generator? I mean - for single for ... loop ?

Thank you in advance!

Maria
  • 59
  • 10
  • 3
    welcome! It looks like you want us to write some code for you. While many users are willing to produce code for a coder in distress, they usually only help when the poster has already tried to solve the problem on their own. A good way to demonstrate this effort is to include the code you've written so far, example input (if there is any), the expected output, and the output you actually get (console output, tracebacks, etc.). The more detail you provide, the more answers you are likely to receive. Check the [FAQ] and [ask]. – MooingRawr Oct 18 '17 at 18:41
  • Perhaps you should Google for a *code writing service*. – Willem Van Onsem Oct 18 '17 at 18:42

2 Answers2

1

Just for fun I did this on a one-liner:

[ { "taskid" : k, "input" : [input["name"] for input in lst if input["taskid"] == k and input["type"] == "input"], "output" : [output["name"] for output in lst if output["taskid"] == k and output["type"] == "output"] } for k in set(e["taskid"] for e in lst) ]
imreal
  • 10,178
  • 2
  • 32
  • 48
1

Here is an O(n) solution:

In [5]: from collections import defaultdict

In [6]: grouper = defaultdict(lambda:defaultdict(list))

In [7]: for d in data:
    ...:     grouper[d['taskid']][d['type']].append(d['name'])
    ...:

In [8]: grouper
Out[8]:
defaultdict(<function __main__.<lambda>>,
            {1: defaultdict(list,
                         {'input': ['First_in', 'Second_in', 'Third_in'],
                          'output': ['First_out', 'Second_out', 'Third_out']}),
             2: defaultdict(list,
                         {'input': ['First_in'],
                          'output': ['First_out', 'Second_out']})})

Quite frankly, I would stop here, since I think this is a more convenient data-structure, but if you really need a list:

In [9]: [{'taskid':k, **v} for k, v in grouper.items()]
Out[9]:
[{'input': ['First_in', 'Second_in', 'Third_in'],
  'output': ['First_out', 'Second_out', 'Third_out'],
  'taskid': 1},
 {'input': ['First_in'], 'output': ['First_out', 'Second_out'], 'taskid': 2}]

Also, this will work if data is not a list but a single-pass iterator (e.g. a generator).

Also, the ** splat syntax won't work on Python 2, so use:

In [10]: [dict(taskid=k, **v) for k, v in grouper.items()]
Out[10]:
[{'input': ['First_in', 'Second_in', 'Third_in'],
  'output': ['First_out', 'Second_out', 'Third_out'],
  'taskid': 1},
 {'input': ['First_in'], 'output': ['First_out', 'Second_out'], 'taskid': 2}]
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • That's what I've searched! Thanks a lot. – Maria Oct 18 '17 at 19:08
  • But in [9] there is an error: (python 2.7) return [{'taskid':k, **v} for k, v in grouper.items()] ^ SyntaxError: invalid syntax – Maria Oct 18 '17 at 19:08
  • 1
    @Maria what is the error? What version of Python are you on? – juanpa.arrivillaga Oct 18 '17 at 19:09
  • @Maria use `[dict(taskid=k, **v) for k, v in grouper.items()]` on old versions of Python – juanpa.arrivillaga Oct 18 '17 at 19:12
  • Yes, it works with dict! Great, thank you )) I'll learn this approach also, and read about ** splat syntax... – Maria Oct 18 '17 at 19:20
  • @Maria you can start by reading [this question](https://stackoverflow.com/questions/36901/what-does-double-star-asterisk-and-star-asterisk-do-for-parameters). Note, I was actually using the splat syntax with a *dict literal*, which is only available with recent Python versions. So, for example, `x = [1,2]; print(['a','b',*x,'c','d])` unpacks into *literals*. Which is quite handy. – juanpa.arrivillaga Oct 18 '17 at 19:23