difficult dictionary comprehension

Question

I'm trying to perform a nested dictionary comprehension from this data.

data =[
    ['Peter', 'June', 100],
    ['Peter', 'July', 200],
    ['Peter', 'August', 120],
    ['Peter', 'September', 202],
    ['Bob', 'June', 300],
    ['Bob', 'July', 101],
    ['Bob', 'August', 200],
    ['Bob', 'September', 100]
]

The correct output that I need is.

targets = {
    'June': {'Peter': 100, 'Bob': 300},
    'July': {'Peter': 200, 'Bob': 101},
    'August': {'Peter': 120, 'Bob': 200},
    'September': {'Peter': 202, 'Bob': 100}
}

my code is as follows,

targets = {row[1]: {row[0]: row[2] for row in data} for row in data}

The faulty output I'm getting is:

{ 
 'June': {'Peter': 202, 'Bob': 100}, 
 'July': {'Peter': 202, 'Bob': 100}, 
 'August': {'Peter': 202, 'Bob': 100}, 
 'September': {'Peter': 202, 'Bob': 100}
}

Please advice what the correct code should be.

Don't use a comprehension here – Mad Physicist Jul 15 '22 at 10:31 — Mad Physicist, Jul 15 '22 at 10:31

score 3 · Answer 1 · answered Jul 15 '22 at 10:32

Instead of using a comprehension, use a defaultdict from collections:

from collections import defaultdict

data =[
    ['Peter', 'June', 100],
    ['Peter', 'July', 200],
    ['Peter', 'August', 120],
    ['Peter', 'September', 202],
    ['Bob', 'June', 300],
    ['Bob', 'July', 101],
    ['Bob', 'August', 200],
    ['Bob', 'September', 100]
]

d = defaultdict(dict)
for name, month, x in data:
    d[month][name] = x

If you must for some reason use a comprehension:

d = {
    m: {name: x for name, month, x in data if month == m} 
    for m in set(m for _, m, _ in data)
}

score 2 · Answer 2 · answered Jul 15 '22 at 10:35

I'm not sure this structure lends itself to a comprehension. As always, start with a plain loop:

targets = {}
for row in data:
    month = row[1]
    if month not in targets:
        targets[month] = {}
    targets[month][row[0]] = row[2]

Your original attempt is going through the entire dataset and assigning all of the rows one by one to each month, so they all end up with the data from the last one.

If you were to sort the data by month and use itertools.groupby on the month, you could get some traction:

key = operator.itemgetter(1)
data.sort(key)
targets = {k: {row[0]: row[2] for row in g} for k, g in groupby(data, key)}

Amisha Kirti · Answer 3 · 2022-07-15T11:42:45.670

There's a similar question asked here about list to dictionary conversion with multiple values.

You can solve it simply using loops and dictionary methods:

data =[
    ['Peter', 'June', 100],
    ['Peter', 'July', 200],
    ['Peter', 'August', 120],
    ['Peter', 'September', 202],
    ['Bob', 'June', 300],
    ['Bob', 'July', 101],
    ['Bob', 'August', 200],
    ['Bob', 'September', 100]
]

targets={}
for row in data:
    if row[1] not in targets:
        targets[row[1]]={row[0]:row[2]}
    targets[row[1]].update({row[0]:row[2]})

score 0 · Answer 4 · answered Jul 15 '22 at 10:50

The problem was, that it changed the value everytime he went through the loop. An if statement checking the months can change that.

targets = {outer_row[1]: {inner_row[0]: inner_row[2] for inner_row in data if outer_row[1] == inner_row[1]} for outer_row in data}

score 0 · Answer 5 · answered Jul 15 '22 at 11:16

I think you shouldn't go for dictionary comprehension here.

But if you need to go then:

What's going on?

In each iteration, you are overriding the previous value. That's why you are getting the last value for each person.

Hope This will make sense:

1. {Peter: 100} #for June
2. {Peter: 200} #for July
3. {Peter: 120} #for August
4. {Peter: 202} #for September


5. { Peter: 202, Bob: 300} #for June
6. {Peter: 202, Bob: 101} #for July
7. {Peter: 202, Bob: 200} #for August
8. {Peter: 202, Bob: 100} #for September( this is what you are getting for every month)

What's the solution?

You can go with @Nova 's solution To prevent this override you can check if outer_row[1] == inner_row[1]

targets = {outer_row[1]: {inner_row[0]: inner_row[2] for inner_row in data if outer_row[1] == inner_row[1]} for outer_row in data}

score 0 · Answer 6 · answered Jul 15 '22 at 17:26

You ask for a list comprehension, but what you need is called a fold. In Python, that's the functools.reduce function.

With a dictionary comprehension, you iterate over elements (list elements, dict items, whatever you want...) and you create or update an entry of the dict (you may also filter values). It's a purely sequential job : you have to take all decisions element by element, without information about the dictionary being built.

With a fold, you also iterate over elements, but you have the information about the dictionary being built, because at each step you take the current element (as in a dictionary comprehension) but also an accumulator to build the updated value of this accumulator. The return value of the fold is the last value of the accumulator, when all elements are processed.

Why do you need information about the dict being built ? Look at the data :

>>> data =[
...     ['Peter', 'June', 100],
...     ['Peter', 'July', 200],
...     ['Peter', 'August', 120],
...     ['Peter', 'September', 202],
...     ['Bob', 'June', 300],
...     ['Bob', 'July', 101],
...     ['Bob', 'August', 200],
...     ['Bob', 'September', 100]
... ]

When you read the element ['Bob', 'June', 300], you have two options :

ignore the element : but you will miss the association June -> Bob -> 300.
take the element : but you will replace and loose June -> Peter -> 100.

I'm pretty sure you don't have any way to bypass this limitation in one iteration (look at the excellent @Grismar answer: he has to perform two iterations to get the missing information).

With the fold, you can check if there are already values associated to June, and update those values:

>>> import functools
>>> functools.reduce(lambda acc, x: {**acc, x[1]: {**acc.get(x[1], {}), x[0]: x[2]}}, data, {})
{'June': {'Peter': 100, 'Bob': 300}, 'July': {'Peter': 200, 'Bob': 101}, 'August': {'Peter': 120, 'Bob': 200}, 'September': {'Peter': 202, 'Bob': 100}}

Or with Python3.9+ merge operator:

>>> functools.reduce(lambda acc, x: acc | {x[1]: acc.get(x[1], {}) | {x[0]: x[2]}}, data, {})
{'June': {'Peter': 100, 'Bob': 300}, 'July': {'Peter': 200, 'Bob': 101}, 'August': {'Peter': 120, 'Bob': 200}, 'September': {'Peter': 202, 'Bob': 100}}

This is more complicated than a regular loop, but understandable though. The first argument of reduce is a function that builds the updated value of the accumulator. The second argument is the iterable. The third argument is an optional initial value of the accumulator (here an empty dict). Let me explain the function.

For each new element, we update the value associated with x[1] (the month) in the accumulator. The new value is the old one or an empty dict (acc.get(x[1], {})) augmented with a new entry x[0]: x[2], that is name: value. You can easily convince yourself that the last value of the accumulator is the expected dictionary.

difficult dictionary comprehension

6 Answers6