4

I have a list of dictionaries with various keys and values. I am trying to group it based on the keys

from itertools import chain, zip_longest 

data = [
    {'a': 2, 'b': 4, 'c': 3, 'd': 2},   
    {'b': 2, 'c': 2, 'd': 5, 'e': 4, 'f': 1},
    {'a': 2, 'd': 2, 'e': 6, 'f': 5, 'g': 12},
    {'b': 2, 'd': 2, 'e': 6, 'f': 6},
    {'c': 5, 'e': 33, 'g': 21, 'h': 56, 'i': 21}
    ]

print(type(data))

bar ={
    k: [d.get(k) for d in data]
    for k in chain.from_iterable(data)
}

print(bar)

My Output:

{'a': [2, None, 2, None, None], 'b': [4, 2, None, 2, None], 
'c': [3, 2, None, None, 5], 'd':[2, 5, 2, 2, None], 'e': [None, 4, 6, 6, 33], 
'f': [None, 1, 5, 6, None], 'g': [None, None, 12, None, 21], 
'h': [None, None, None, None, 56], 'i': [None, None, None, None, 21]}

I don't want to display "None" in the values

Desired Output:

 {'a': [2, 2], 'b': [4, 2, 2], 'c': [3, 2, 5], 'd':[2, 5, 2, 2], 'e': [4, 6, 6, 33], 
'f': [1, 5, 6], 'g': [1221], 'h': [56], 'i': [21]}

I tried to use filter function too but it dodn't worked out. Any guidance on how to remove None?

Code

John
  • 565
  • 8
  • 23
  • 1
    You can change your list comprehension (in your dict) to be `[d.get(k) for d in data if d.get(k) is not None]` – tomjn Sep 16 '19 at 08:29
  • 1
    Or `k: [d[k] for d in data if k in d]` – Chris Sep 16 '19 at 08:30
  • @chirs: True, providing the value of item `k` isn't `False` or another falsy value! – tomjn Sep 16 '19 at 08:32
  • @tomjn or `[val := d.get(k) for d in data if val]` in Python 3.8 so `.get` is no called twice ;) – DeepSpace Sep 16 '19 at 08:33
  • 1
    @DeepSpace it crossed my mind. Thought it best not to give 3.8 answers until it is out of beta ;) – tomjn Sep 16 '19 at 08:34
  • Possible duplicate of [remove None value from a list without removing the 0 value](https://stackoverflow.com/questions/16096754/remove-none-value-from-a-list-without-removing-the-0-value) – Phoenix Sep 16 '19 at 09:23

6 Answers6

3

Try this:

from operator import is_not
from functools import partial

{ k: list(filter(partial(is_not, None), v)) for k, v in d.items() }

Input: {'x': [0, 23, 234, 89, None, 0, 35, 9] }

Output: {'x': [0, 23, 234, 89, 0, 35, 9]}

Community
  • 1
  • 1
Phoenix
  • 3,996
  • 4
  • 29
  • 40
2

Instead of using get, which returns None if the key is not present, just use d[k] but check whether k in d first. Also, I'd suggest not using chain as that will calculate many of the lists twice or more, each time overwriting the previously created list, as many keys are present in multiple dictionaries. Instead, you can iterate a set of all the keys.

>>> {k: [d[k] for d in data if k in d]
...  for k in set(k for d in data for k in d)}
...
{'a': [2, 2], 'b': [4, 2, 2],
 'c': [3, 2, 5], 'd': [2, 5, 2, 2],
 'e': [4, 6, 6, 33], 'f': [1, 5, 6],
 'g': [12, 21], 'h': [56], 'i': [21]}
tobias_k
  • 81,265
  • 12
  • 120
  • 179
1

You can use filter(None, x) to remove the Nones:

filter(None, [3, 4, None, 2, 7, None, 1])
[3, 4, 2, 7, 1]

To have that for all values of a dict, use a comprehension:

{ k: filter(None, v) for k, v in d.items() }

(Use .iteritems() in Python 2.)

Keep in mind that in Python 3 the filter function produces lazy filter-objects which can be iterated cheaply. To convert them to lists, just use list(filter(...)).

But it might be better to not introduce the None values in the first place:

r = {}
for d in data:
  for k, v in d.items():
    r.setdefault(k, []).append(v)
print(r)
Alfe
  • 56,346
  • 20
  • 107
  • 159
  • 2
    Probably better to do `{ k: list(filter(None, v)) for k, v in d.items() }` so the values will be lists and not `filter` objects – DeepSpace Sep 16 '19 at 08:31
  • @DeepSpace That might be useful in many cases but in general I prefer to keep the lazy iterables as long as possible as they are way cheaper in case not all elements are really needed or all are needed but not at the same time (saving memory then). And to convert them to `list`s is still always possible, as you demonstrated. – Alfe Sep 16 '19 at 08:33
  • since the values are lists in the original question I'd prefer the values to be lists in the output of every answer, otherwise OP may get other bugs down the road if they are not aware – DeepSpace Sep 16 '19 at 08:35
  • @DeepSpace I prefer to make them aware and leave them the choice and responsibility. – Alfe Sep 16 '19 at 08:36
  • but you don't. The answer says nothing about the values not being lists anymore – DeepSpace Sep 16 '19 at 08:37
  • @DeepSpace Easy :) I just did. – Alfe Sep 16 '19 at 08:38
  • Personally, i find `filter`, `map` a bit too implicit, that a new user may not find it easy to understand. Hence i always go for the `explicit` comprehensions, like the one @Chris showed `{k: [d[k] for d in data if k in d]}` :) – han solo Sep 16 '19 at 08:46
  • @hansolo I also prefer readability over performance. Still, readability is a question of what idioms you are used to and who you write code for (professionals or not, for instance). OP didn't look like a newbie to me and they also asked for using `filter` explicitly. So I figured that might make sense to have it in the answer ;-) And the double lookup in your proposal then might just not be necessary. – Alfe Sep 16 '19 at 08:49
  • Sure. Yeah, it's just a personal preference :) – han solo Sep 16 '19 at 08:54
1

If you want to use your code you can just do:

bar ={
    k: [d.get(k) for d in data if d.get(k) != None]
    for k in chain.from_iterable(data)
}

print(bar)

output:

{'a': [2, 2], 'b': [4, 2, 2], 'c': [3, 2, 5], 'd': [2, 5, 2, 2], 'e': [4, 6, 6, 33], 'f': [1, 5, 6], 'g': [12, 21], 'h': [56], 'i': [21]}
ncica
  • 7,015
  • 1
  • 15
  • 37
1

The get function of a dictionary would return None when they key does not exist. You can simple use an if condition to ensure the value exists.

bar = {k: [d[k] for d in data if d.get(k) is not None] for k in chain.from_iterable(data)}

If your dictionary is very large containing lots of Nones in the values, the double look up will be costly. So you can use filter instead.

bar = {k: list(filter(None, [d.get(k) for d in data])) for k in chain.from_iterable(data)}
hckrman
  • 136
  • 9
  • That works but has a double look-up for each element. – Alfe Sep 16 '19 at 08:35
  • OP's other option is to re-iterate over the entire dictionary and filtering out None values from each list. This would be better. – hckrman Sep 16 '19 at 08:37
  • 1
    Depends on how costly the lookup is which depends on the size of the `dict`. Filtering out `None`s can be way cheaper. – Alfe Sep 16 '19 at 08:41
1

Most of the offered solutions concentrate on keeping OP approach with complex comprehension. I think in this case its warranted to split the loops on different lines, instead of using comprehension.

data = [...]

bar = {}
for my_dict in data:
   for key, value in my_dict.items():
      bar.setdefault(key, []).append(value)

print(bar)
buran
  • 13,682
  • 10
  • 36
  • 61
  • Yeah, that's also my preferred solution. OP asked explicitly for using `filter`, though. Looks like an x/y problem. – Alfe Sep 16 '19 at 08:54