11

I'm trying to think of a one-liner to achieve the following (summing all the values of a key):

>>> data = [('a', 1), ('b', 3), ('a', 4), ('c', 9), ('b', 1), ('d', 3)]
>>> res = {}
>>> for tup in data:
...     res[tup[0]] = res.setdefault(tup[0], 0) + tup[1]
... 
>>> res
{'a': 5, 'c': 9, 'b': 4, 'd': 3}

One-liner version without using any imports like itertools,collections etc.

 {tup[0]: SELF_REFERENCE.setdefault(tup[0], 0) + tup[1]  for tup in data}

Is it possible in Python to use a reference to the object currently being comprehended?

If not, is there any way to achieve this in a one-liner without using any imports i.e. using basic list/dict comprehension and inbuilt functions?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
DhruvPathak
  • 42,059
  • 16
  • 116
  • 175

6 Answers6

13

No, there is not. A dict comprehension produces a new item for each iteration, and your code needs to produce fewer items (consolidating values).

There is no way to access keys produced in an earlier iteration, not without using (ugly, unpythonic) side-effect tricks. The dict object that is going to be produced by the comprehension doesn't exist yet, so there is no way to produce a self-reference either.

Just stick to your for loop, it is far more readable.

The alternative would be to use sorting and grouping, a O(NlogN) algorithm vs. the simple O(N) of your straight loop:

from itertools import groupby
from operator import itemgetter

res = {key: sum(t[1] for t in group) 
       for key, group in groupby(sorted(data, key=itemgetter(0)), key=itemgetter(0))}
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • *"The dict object that is going to be produced by the comprehension doesn't exist yet"* - Was that really true back then? It isn't now. If it was true, where did the comprehension put the data if not into a dict object? – Kelly Bundy Aug 29 '23 at 01:25
  • 1
    @KellyBundy I suppose what was meant is that the dictionary under construction isn't available in `locals()` or otherwise user-accessible (as opposed to existence). – Mateen Ulhaq Aug 29 '23 at 01:41
  • @MateenUlhaq I suspect it *is* even user-accessible. At least the list of a list comp is, I've done that a few times, it can be found in `gc.get_objects()`.`Not sure I've done it with dict. – Kelly Bundy Aug 29 '23 at 02:48
  • @KellyBundy I got your `gc.get_objects` suggestion [working](https://stackoverflow.com/a/76997570/365102) with list comprehensions, but not with dict comprehensions. – Mateen Ulhaq Aug 29 '23 at 05:40
  • @MateenUlhaq [Working dict comp](https://stackoverflow.com/a/77001410/12671057). – Kelly Bundy Aug 29 '23 at 15:07
  • @MartijnPieters *"No, there is not. A dict comprehension produces a new item for each iteration"* - That's not right. It *can* produce fewer items, by filtering with an `if` clause or by overwriting items by using repeated keys (as in the question's attempt). – Kelly Bundy Aug 29 '23 at 15:17
  • @KellyBundy I am fully aware of those details but they make no difference to the answer. No point in complicating matters here with too much irrelevant detail. – Martijn Pieters Aug 29 '23 at 23:32
4

Don't use a oneliner. Instead use collections.defaultdict and a simple for loop:

>>> pairs = [('a', 1), ('b', 3), ('a', 4), ('c', 9), ('b', 1), ('d', 3)]
>>> result = defaultdict(int)
>>> for key, value in pairs:
...     result[key] += value
...
>>> result
defaultdict(<class 'int'>, {'a': 5, 'c': 9, 'b': 4, 'd': 3})

It is easy to understand, pythonic and fast.

pillmuncher
  • 10,094
  • 2
  • 35
  • 33
2

Use reduce and collections.Counter:

>>> from operator import add
>>> from collections import Counter
>>> reduce(add, (Counter(dict([x])) for x in data))
Counter({'c': 9, 'a': 5, 'b': 4, 'd': 3})
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
2

This is almost like, what you are trying to do. But I won't recommend this, as the readability suffers.

data = [('a',1),('b',3),('a',4),('c',9),('b',1),('d',3)]
print reduce(lambda d,i: [d.__setitem__(i[0],d.get(i[0],0)+i[1]),d][1], data, {})

Output

{'a': 5, 'c': 9, 'b': 4, 'd': 3}
thefourtheye
  • 233,700
  • 52
  • 457
  • 497
1

Super-awful hack for dict comprehension, finding the dict being built by the comprehension and making it available as self during the comprehension. I have three phases:

  1. Mark the dict by putting a special item into it, with a special marker object as key and an empty list as value. That appears to make the dict register itself for cyclic garbage collection (since the list could lead to a reference cycle).

  2. Find the dict in gc.objects() (objects registered for cyclic garbage collection), identifying it by looking for the marker. Then remove the special item.

  3. Fill the dict with the data we actually want, using self to reference the dict.

import gc

data = [('a', 1), ('b', 3), ('a', 4), ('c', 9), ('b', 1), ('d', 3)]

res = {
    (marker if phase == 'mark' else tup[0]):
        ([] if phase == 'mark' else self.get(tup[0], 0) + tup[1])
    for marker in [object()]
    for phase in ['mark', 'find', 'fill']
    for self in [
        None if phase == 'mark' else
        next(
            o for o in gc.get_objects()
            if type(o) is dict
            and next(iter(o), None) is marker
            and not o.clear()
        ) if phase == 'find' else
        self
    ]
    if phase != 'find'
    for tup in ([None] if phase == 'mark' else data)
}

print(res)

Output as desired (Attempt This Online!):

{'a': 5, 'b': 4, 'c': 9, 'd': 3}
Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
0

Disclaimer: Obviously, this should not be used for any practical purpose.

Based on @superb rain's answer, by filtering through gc.get_objects, we can get the list comprehension under construction:

import gc

def get_first_obj(obj_ids):
    for obj in gc.get_objects():
        if id(obj) in obj_ids or obj is obj_ids:
            continue
        return obj

obj_ids = set(id(obj) for obj in gc.get_objects())
xs = [(get_first_obj(obj_ids).append("Evil!"), i)[1] for i in range(5)]
print(xs)

Output:

['Evil!', 0, 'Evil!', 1, 'Evil!', 2, 'Evil!', 3, 'Evil!', 4]

Note that this is not very robust; for instance what would happen when get_first_obj is not called immediately?


Here, we print out new objects created during the comprehension as follows:

import gc

def print_objs(obj_ids):
    for obj in gc.get_objects():
        if id(obj) in obj_ids or obj is obj_ids:
            continue
        print(id(obj), obj)

obj_ids = set(id(obj) for obj in gc.get_objects())
xs = [(print_objs(obj_ids), i)[1] for i in range(5)]

Output:

139842827736960 []
139842827736960 [0]
139842827879264 ('sep', 'end', 'file', 'flush')
139842827736960 [0, 1]
139842827879264 ('sep', 'end', 'file', 'flush')
139842828122560 []
139842832825536 {'Py_Repr': [{...}, [...]]}
139842827736960 [0, 1, 2]
139842827879264 ('sep', 'end', 'file', 'flush')
139842828122560 []
139842832825536 {'Py_Repr': [{...}, [...]]}
139842827736960 [0, 1, 2, 3]
139842827879264 ('sep', 'end', 'file', 'flush')
139842828122560 []
139842832825536 {'Py_Repr': [{...}, [...]]}

It looks like the first item printed is usually the comprehension-produced object.

Interestingly, nothing is printed during a dict comprehension.

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135