Filter dict to contain only certain keys?

Question

I've got a dict that has a whole bunch of entries. I'm only interested in a select few of them. Is there an easy way to prune all the other ones out?

It's helpful to say what type of keys (integers? strings? dates? arbitrary objects?) and thus whether there's a simple (string, regex, list membership, or numerical inequality) test to check which keys are in or out. Or else do we need to call an arbitrary function(s) to determine that. — smci, Nov 14 '19 at 23:09
@smci String keys. Don't think it even occurred to me that I could use anything else; I've been coding in JS and PHP for so long... — mpen, Nov 15 '19 at 00:48

score 954 · Accepted Answer · edited Apr 17 '23 at 08:09

954

Constructing a new dict:

dict_you_want = {key: old_dict[key] for key in your_keys}

Uses dictionary comprehension.

If you use a version which lacks them (ie Python 2.6 and earlier), make it dict((key, old_dict[key]) for ...). It's the same, though uglier.

Note that this, unlike jnnnnn's version, has stable performance (depends only on number of your_keys) for old_dicts of any size. Both in terms of speed and memory. Since this is a generator expression, it processes one item at a time, and it doesn't looks through all items of old_dict.

Removing everything in-place:

unwanted = set(old_dict) - set(your_keys)
for unwanted_key in unwanted: del your_dict[unwanted_key]

edited Apr 17 '23 at 08:09

Tomerikoo

18,379
16
47
61

answered Aug 06 '10 at 00:22

31

Throws a KeyError if one of the filer keys is not present in old_dict. I would suggest {k:d[k] for k in filter if k in d} – Peter Gibson Jun 28 '12 at 01:53
6

@PeterGibson Yes, if that's part of the requirements, you need to do *something* about it. Whether it's silently dropping the keys, adding a default value, or something else, depends on what you are doing; there are plenty of use cases where your approach is wrong. There are also many where a key missing in `old_dict` indicates a bug elsewhere, and in that case I very much prefer an error to silently wrong results. – Jun 29 '12 at 17:12
@delnan, also the "if k in d" addition slows you down if d is large, I just thought it was worth mentioning – Peter Gibson Jul 01 '12 at 02:02
7

@PeterGibson It doesn't, dictionary lookup is O(1). – Jul 01 '12 at 10:48
If you do want to not throw an error if the key is not present, this works: dict_youwant = dict((key, old_dict[key]) for key in [k for k in your_keys if k in old_dict]) – andrew pate May 24 '15 at 09:06
2

nit: Dictionaries are hash maps, so the normal case is O(1). Worst (highly unlikely) case is O(n), but depends on hash collision likelihood. You'd need an astronomically large dictionary, or a really crude hashing algorithm to start seeing that be an issue. https://stackoverflow.com/a/1963514/1335793 – Davos Oct 05 '17 at 12:57
I think that should be "unwanted = set(keys) & set(your_dict)", i.e. set intersection (&), not set difference (-). Assuming keys is your list of unwanted keys. – Terry Brown Dec 22 '17 at 17:14
@andrewpate Little bit neater: `dict_youwant = {k: old[k] for k in your_keys & old.keys()}` – Ezekiel Victor Mar 06 '23 at 14:41

score 249 · Answer 2 · answered Apr 09 '14 at 22:21

249

Slightly more elegant dict comprehension:

foodict = {k: v for k, v in mydict.items() if k.startswith('foo')}

answered Apr 09 '14 at 22:21

ransford

2,647
1
14
3

1

Upvoted. I was thinking about adding an answer similar to this. Just out of curiosity though, why do {k:v for k,v in dict.items() ...} rather than {k:dict[k] for k in dict ...} Is there a performance difference? – Hart Simha Jun 24 '14 at 17:30
9

Answered my own question. The {k:dict[k] for k in dict ...} is about 20-25% faster, at least in Python 2.7.6, with a dictionary of 26 items (timeit(..., setup="d = {chr(x+97):x+1 for x in range(26)}")), depending on how many items are being filtered out (filtering out consonant keys is faster than filtering out vowel keys because you're looking up fewer items). The difference in performance may very well become less significant as your dictionary size grows. – Hart Simha Jun 24 '14 at 18:13
8

Would probably be the same perf if you used `mydict.iteritems()` instead. `.items()` creates another list. – Pat Jun 16 '16 at 22:47

jnnnnn · Answer 3 · 2010-08-06T00:30:36.053

75

Here's an example in python 2.6:

>>> a = {1:1, 2:2, 3:3}
>>> dict((key,value) for key, value in a.iteritems() if key == 1)
{1: 1}

The filtering part is the if statement.

This method is slower than delnan's answer if you only want to select a few of very many keys.

edited Aug 06 '10 at 00:30

answered Aug 06 '10 at 00:13

jnnnnn

3,889
32
37

14

except I'd probably use `if key in ('x','y','z')` I guess. – mpen Aug 06 '10 at 00:17
1

if you already know which keys you want, use delnan's answer. If you need to test each key with an if statement, use ransford's answer. – jnnnnn Sep 19 '15 at 08:02
2

This solution has one more advantage. If the dictionary is returned from an expensive function call (i.e. a/old_dict is a function call) this solution calls the function only once. In an imperative environment storing the dictionary returned by the function in a variable is not a big deal but in a functional environment (e.g. in a lambda) this is key observation. – gae123 Jan 27 '16 at 01:11

Suor · Answer 4 · 2016-11-04T09:22:20.007

30

You can do that with project function from my funcy library:

from funcy import project
small_dict = project(big_dict, keys)

Also take a look at select_keys.

edited Nov 04 '16 at 09:22

answered Jun 04 '14 at 21:31

Suor

2,845
1
22
28

Jim · Answer 5 · 2018-07-06T21:28:23.087

28

This one liner lambda should work:

dictfilt = lambda x, y: dict([ (i,x[i]) for i in x if i in set(y) ])

Here's an example:

my_dict = {"a":1,"b":2,"c":3,"d":4}
wanted_keys = ("c","d")

# run it
In [10]: dictfilt(my_dict, wanted_keys)
Out[10]: {'c': 3, 'd': 4}

It's a basic list comprehension iterating over your dict keys (i in x) and outputs a list of tuple (key,value) pairs if the key lives in your desired key list (y). A dict() wraps the whole thing to output as a dict object.

edited Jul 06 '18 at 21:28

answered Nov 28 '13 at 02:08

Jim

568
1
7
14

Should use a `set` for `wanted_keys`, but otherwise looks good. – mpen Nov 28 '13 at 02:35
This gives me a blank dictionary if my original dictionary contains lists in place of values. Any workarounds? – FaCoffee Oct 27 '15 at 13:35
@Francesco, can you provide an example? If I run: `dictfilt({'x':['wefwef',52],'y':['iuefiuef','efefij'],'z':['oiejf','iejf']}, ('x','z'))`, it returns `{'x': ['wefwef', 52], 'z': ['oiejf', 'iejf']}` as intended. – Jim Oct 28 '15 at 14:11
I tried this with: `dict={'0':[1,3], '1':[0,2,4], '2':[1,4]}` and the result was `{}`, which I assumed to be a blank dict. – FaCoffee Oct 28 '15 at 14:15
1

One thing, "dict" is a reserved word so you shouldn't use it to name a dict. What were the keys you were trying to pull out? If I run: `foo = {'0':[1,3], '1':[0,2,4], '2':[1,4]}; dictfilt(foo,('0','2'))`, I get: `{'0': [1, 3], '2': [1, 4]}` which is the intended outcome – Jim Oct 28 '15 at 20:22

score 27 · Answer 6 · answered Jun 26 '17 at 01:14

27

Code 1:

dict = { key: key * 10 for key in range(0, 100) }
d1 = {}
for key, value in dict.items():
    if key % 2 == 0:
        d1[key] = value

Code 2:

dict = { key: key * 10 for key in range(0, 100) }
d2 = {key: value for key, value in dict.items() if key % 2 == 0}

Code 3:

dict = { key: key * 10 for key in range(0, 100) }
d3 = { key: dict[key] for key in dict.keys() if key % 2 == 0}

All pieced of code performance are measured with timeit using number=1000, and collected 1000 times for each piece of code.

For python 3.6 the performance of three ways of filter dict keys almost the same. For python 2.7 code 3 is slightly faster.

answered Jun 26 '17 at 01:14

Y.Y

531
4
9

1

just curious, did you make that plot from Python? – user5359531 Oct 19 '17 at 16:53
1

ggplot2 in R - part of [tidyverse](https://ggplot2.tidyverse.org/reference/geom_bar.html) – keithpjolley Aug 20 '18 at 05:08
2

better not use `dict` as it's builtin in python. – Underoos Oct 08 '21 at 15:36

score 18 · Answer 7 · answered Aug 06 '10 at 00:49

18

Given your original dictionary orig and the set of entries that you're interested in keys:

filtered = dict(zip(keys, [orig[k] for k in keys]))

which isn't as nice as delnan's answer, but should work in every Python version of interest. It is, however, fragile to each element of keys existing in your original dictionary.

answered Aug 06 '10 at 00:49

Kai

5,260
5
29
36

Well, this is basically an eager version of the "tuple generator version" of my dict comprehension. Very compatible indeed, though generator expressions were introduced in 2.4, spring 2005 - seriously, is anyone still using this? – Aug 06 '10 at 01:20
2

I don't disagree; 2.3 really shouldn't exist anymore. However, as an outdated survey of 2.3 usage: http://moinmo.in/PollAboutRequiringPython24 Short version: RHEL4, SLES9, shipped with OS X 10.4 – Kai Aug 06 '10 at 03:51

MyGGaN · Answer 8 · 2013-07-21T22:10:18.087

Based on the accepted answer by delnan.

What if one of your wanted keys aren't in the old_dict? The delnan solution will throw a KeyError exception that you can catch. If that's not what you need maybe you want to:

only include keys that excists both in the old_dict and your set of wanted_keys.

old_dict = {'name':"Foobar", 'baz':42}
wanted_keys = ['name', 'age']
new_dict = {k: old_dict[k] for k in set(wanted_keys) & set(old_dict.keys())}

>>> new_dict
{'name': 'Foobar'}

have a default value for keys that's not set in old_dict.

default = None
new_dict = {k: old_dict[k] if k in old_dict else default for k in wanted_keys}

>>> new_dict
{'age': None, 'name': 'Foobar'}

You could also do `{k: old_dict.get(k, default) for k in ...}` — Moberg, Apr 19 '18 at 14:45

Ryan Shea · Answer 9 · 2015-02-15T21:59:04.107

11

This function will do the trick:

def include_keys(dictionary, keys):
    """Filters a dict by only including certain keys."""
    key_set = set(keys) & set(dictionary.keys())
    return {key: dictionary[key] for key in key_set}

Just like delnan's version, this one uses dictionary comprehension and has stable performance for large dictionaries (dependent only on the number of keys you permit, and not the total number of keys in the dictionary).

And just like MyGGan's version, this one allows your list of keys to include keys that may not exist in the dictionary.

And as a bonus, here's the inverse, where you can create a dictionary by excluding certain keys in the original:

def exclude_keys(dictionary, keys):
    """Filters a dict by excluding certain keys."""
    key_set = set(dictionary.keys()) - set(keys)
    return {key: dictionary[key] for key in key_set}

Note that unlike delnan's version, the operation is not done in place, so the performance is related to the number of keys in the dictionary. However, the advantage of this is that the function will not modify the dictionary provided.

Edit: Added a separate function for excluding certain keys from a dict.

edited Feb 15 '15 at 21:59

answered Aug 03 '13 at 22:37

Ryan Shea

4,252
4
32
32

You should allow `keys` to by any kind of iterable, like what [set](http://docs.python.org/2/library/sets.html#sets.Set) accepts. – mpen Aug 03 '13 at 23:19
Ah, good call, thanks for pointing this out. I'll make that update. – Ryan Shea Aug 05 '13 at 02:51
I wonder if you are better off with two functions. If you asked 10 people "does `invert` imply that the `keys` argument is kept, or that the `keys` argument is rejected?", how many of them would agree? – skatenerd Feb 13 '15 at 01:46
Updated. Let me know what you think. – Ryan Shea Feb 15 '15 at 21:59
This appears not to be working if the input dict has lists in place of values. In this case you get a void dict. Any workarounds? – FaCoffee Oct 27 '15 at 13:38

score 10 · Answer 10 · answered May 15 '21 at 15:46

10

This seems to me the easiest way:

d1 = {'a':1, 'b':2, 'c':3}
d2 = {k:v for k,v in d1.items() if k in ['a','c']}

I like doing this to unpack the values too:

a, c = {k:v for k,v in d1.items() if k in ['a','c']}.values()

answered May 15 '21 at 15:46

Joey

141
1
3

This method is the easiest one to parse and understand for newbies like me. – linguist_at_large Jun 07 '21 at 07:01
For efficiency, I'd recommend filtering keys using a set: `if k in {'a','c'}` rather than `if k in ['a','c']`. – Valentin Waeselynck Oct 01 '21 at 22:22

score 8 · Answer 11 · answered Mar 20 '18 at 16:33

8

Another option:

content = dict(k1='foo', k2='nope', k3='bar')
selection = ['k1', 'k3']
filtered = filter(lambda i: i[0] in selection, content.items())

But you get a list (Python 2) or an iterator (Python 3) returned by filter(), not a dict.

answered Mar 20 '18 at 16:33

marsl

959
1
14
25

6

Wrap `filtered` in `dict` and you get back the dictionary! – CMCDragonkai Mar 13 '19 at 02:37

this.srivastava · Answer 12 · 2019-05-28T04:24:31.783

7

If we want to make a new dictionary with selected keys removed, we can make use of dictionary comprehension
For example:

d = {
'a' : 1,
'b' : 2,
'c' : 3
}
x = {key:d[key] for key in d.keys() - {'c', 'e'}} # Python 3
y = {key:d[key] for key in set(d.keys()) - {'c', 'e'}} # Python 2.*
# x is {'a': 1, 'b': 2}
# y is {'a': 1, 'b': 2}

edited May 28 '19 at 04:24

answered May 27 '19 at 15:52

this.srivastava

1,045
12
22

Neat. Only works in Python 3. Python 2 says "TypeError: unsupported operand type(s) for -: 'list' and 'set'" – mpen May 27 '19 at 19:45
Added set(d.keys()) for Python 2. This is working when I run. – this.srivastava May 28 '19 at 04:29

score 6 · Answer 13 · answered Jul 17 '21 at 16:45

6

We can also achieve this by slightly more elegant dict comprehension:

my_dict = {"a":1,"b":2,"c":3,"d":4}

filtdict = {k: v for k, v in my_dict.items() if k.startswith('a')}
print(filtdict)

answered Jul 17 '21 at 16:45

Sachin

1,460
17
24

Mr_and_Mrs_D · Answer 14 · 2021-11-16T08:52:34.713

According to the title of the question, one would expect to filter the dictionary in place - a couple of answers suggest methods for doing that - still it's not obvious what is the one obvious way - I added some timings:

import random
import timeit
import collections

repeat = 3
numbers = 10000

setup = ''
def timer(statement, msg='', _setup=None):
    print(msg, min(
        timeit.Timer(statement, setup=_setup or setup).repeat(
            repeat, numbers)))

timer('pass', 'Empty statement')

dsize = 1000
d = dict.fromkeys(range(dsize))
keep_keys = set(random.sample(range(dsize), 500))
drop_keys = set(random.sample(range(dsize), 500))

def _time_filter_dict():
    """filter a dict"""
    global setup
    setup = r"""from __main__ import dsize, collections, drop_keys, \
keep_keys, random"""
    timer('d = dict.fromkeys(range(dsize));'
          'collections.deque((d.pop(k) for k in drop_keys), maxlen=0)',
          "pop inplace - exhaust iterator")
    timer('d = dict.fromkeys(range(dsize));'
          'drop_keys = [k for k in d if k not in keep_keys];'
          'collections.deque('
              '(d.pop(k) for k in list(d) if k not in keep_keys), maxlen=0)',
          "pop inplace - exhaust iterator (drop_keys)")
    timer('d = dict.fromkeys(range(dsize));'
          'list(d.pop(k) for k in drop_keys)',
          "pop inplace - create list")
    timer('d = dict.fromkeys(range(dsize));'
          'drop_keys = [k for k in d if k not in keep_keys];'
          'list(d.pop(k) for k in drop_keys)',
          "pop inplace - create list (drop_keys)")
    timer('d = dict.fromkeys(range(dsize))\n'
          'for k in drop_keys: del d[k]', "del inplace")
    timer('d = dict.fromkeys(range(dsize));'
          'drop_keys = [k for k in d if k not in keep_keys]\n'
          'for k in drop_keys: del d[k]', "del inplace (drop_keys)")
    timer("""d = dict.fromkeys(range(dsize))
{k:v for k,v in d.items() if k in keep_keys}""", "copy dict comprehension")
    timer("""keep_keys=random.sample(range(dsize), 5)
d = dict.fromkeys(range(dsize))
{k:v for k,v in d.items() if k in keep_keys}""",
          "copy dict comprehension - small keep_keys")

if __name__ == '__main__':
    _time_filter_dict()

results:

Empty statement 8.375600000000427e-05
pop inplace - exhaust iterator 1.046749841
pop inplace - exhaust iterator (drop_keys) 1.830537424
pop inplace - create list 1.1531293939999987
pop inplace - create list (drop_keys) 1.4512304149999995
del inplace 0.8008298079999996
del inplace (drop_keys) 1.1573763689999979
copy dict comprehension 1.1982901489999982
copy dict comprehension - small keep_keys 1.4407784069999998

So seems del is the winner if we want to update in place - the dict comprehension solution depends on the size of the dict being created of course and deleting half the keys is already too slow - so avoid creating a new dict if you can filter in place.

Edited to address a comment by @mpen - I calculated drop keys from keep_keys (given we do not have drop keys) - I assumed keep_keys/drop_keys are sets for this iteration or would take ages. With these assumptions del is still faster - but to be sure the moral is: if you have a (set, list, tuple) of drop keys, go for del

`drop_keys` isn't a fair comparison. Question is more akin to `keep_keys`. We know which keys we want, not which ones we don't want. — mpen, Nov 16 '21 at 02:52
Thanks @mpen - indeed if we try to calculate `drop_keys` this slows down a lot pop/del methods. Will post some timings for that — Mr_and_Mrs_D, Nov 16 '21 at 08:07
There @mpen - seems del beats the dict comprehension even if I calculate `drop_keys` (I assumed keep keys are sets for O(1) `k in keep_keys`). Probably this means that creating a dict with 500 entries is a bit slower than creating a list with 500 elements :P — Mr_and_Mrs_D, Nov 16 '21 at 08:42
That's just for the sake of in place edit. In many cases, the original data is to be cached for different filters be apply on to produced different results, instead of recreating the original dict and chop it down repeatedly. — Ben, Dec 10 '22 at 02:44

score 3 · Answer 15 · answered Oct 10 '17 at 04:11

3

Short form:

[s.pop(k) for k in list(s.keys()) if k not in keep]

As most of the answers suggest in order to maintain the conciseness we have to create a duplicate object be it a list or dict. This one creates a throw-away list but deletes the keys in original dict.

answered Oct 10 '17 at 04:11

nehem

12,775
6
58
84

Can you profile what is faster? Note you don't need to create the list: https://stackoverflow.com/a/36763172/281545 – Mr_and_Mrs_D Nov 15 '21 at 09:08

score 3 · Answer 16 · edited Feb 24 '20 at 11:20

3

You could use python-benedict, it's a dict subclass.

Installation: pip install python-benedict

from benedict import benedict

dict_you_want = benedict(your_dict).subset(keys=['firstname', 'lastname', 'email'])

It's open-source on GitHub: https://github.com/fabiocaccamo/python-benedict

^{Disclaimer: I'm the author of this library.}

edited Feb 24 '20 at 11:20

Georgy

12,464
7
65
73

answered Feb 21 '20 at 17:27

Fabio Caccamo

1,871
19
21

score 3 · Answer 17 · answered Dec 08 '21 at 06:27

3

If you know the negation set (aka not keys) in advance:

v = {'a': 'foo', 'b': 'bar', 'command': 'fizz', 'host': 'buzz'  }
args = {k: v[k] for k in v if k not in ["a", "b"]}
args # {'command': 'fizz', 'host': 'buzz'}

answered Dec 08 '21 at 06:27

Steve Fan

3,019
3
19
29

zr0gravity7 · Answer 18 · 2023-02-05T22:44:44.617

The accepted answer throws a KeyError if one of the filter keys is not present in the given dict.

To get a copy of the given dict, containing only some keys from the allowed keys, an approach is to check that the key was indeed present on the given dict in the dict comprehension:

filtered_dict = { k: old_dict[k] for k in allowed_keys if k in old_dict }

This does not impact performance, as the lookup against the dictionary has constant runtime complexity.

Alternatively, you could use old_dict.get(k, some_default) to populate missing items.

score 2 · Answer 19 · answered Aug 28 '19 at 08:09

2

Here is another simple method using del in one liner:

for key in e_keys: del your_dict[key]

e_keys is the list of the keys to be excluded. It will update your dict rather than giving you a new one.

If you want a new output dict, then make a copy of the dict before deleting:

new_dict = your_dict.copy()           #Making copy of dict

for key in e_keys: del new_dict[key]

answered Aug 28 '19 at 08:09

Nouman

6,947
7
32
60

Can you profile what is faster in case you don't need to create a copy (so creating the dict vs `for key in e_keys: del your_dict[key]`)? – Mr_and_Mrs_D Nov 15 '21 at 09:12
@Mr_and_Mrs_D **Not** creating a copy would be definitely faster. About 15% faster – Nouman Nov 15 '21 at 09:35
1

Turns out dict comprehension is indeed slower but greatly depends on size of filtered keys - see: https://stackoverflow.com/a/69973383/281545 – Mr_and_Mrs_D Nov 15 '21 at 11:06

score 2 · Answer 20 · answered Sep 18 '20 at 16:39

We can do simply with lambda function like this:

>>> dict_filter = lambda x, y: dict([ (i,x[i]) for i in x if i in set(y) ])
>>> large_dict = {"a":1,"b":2,"c":3,"d":4}
>>> new_dict_keys = ("c","d")
>>> small_dict=dict_filter(large_dict, new_dict_keys)
>>> print(small_dict)
{'c': 3, 'd': 4}
>>>

score 1 · Answer 21 · answered Jun 10 '21 at 21:09

This is my approach, supports nested fields like mongo query.

How to use:

>>> obj = { "a":1, "b":{"c":2,"d":3}}
>>> only(obj,["a","b.c"])
{'a': 1, 'b': {'c': 2}}

only function:

def only(object,keys):
    obj = {}
    for path in keys:
        paths = path.split(".")
        rec=''
        origin = object
        target = obj
        for key in paths:
            rec += key
            if key in target:
                target = target[key]
                origin = origin[key]
                rec += '.'
                continue
            if key in origin:
                if rec == path:
                    target[key] = origin[key]
                else:
                    target[key] = {}
                target = target[key]
                origin = origin[key]
                rec += '.'
            else:
                target[key] = None
                break
    return obj

Taras Mykhalchuk · Answer 22 · 2022-12-15T08:36:24.037

Just a simple one-line function with a filter to allow only for existing keys.

data = {'give': 'what', 'not': '___', 'me': 'I', 'no': '___', 'these': 'needed'}
keys = ['give', 'me', 'these', 'not_present']

n = { k: data[k] for k in filter(lambda k: k in data, keys) }

print(n)
print(list(n.keys()))
print(list(n.values()))

output:

{'give': 'what', 'me': 'I', 'these': 'needed'}
['give', 'me', 'these']
['what', 'I', 'needed']

score 0 · Answer 23 · answered Mar 16 '23 at 19:46

You can use python's built in filter function and rebuild a dict from the items – though it's not as neat or performant as some of the other methods here:

my_dict = {i: str(i) for i in range(10)}

# I only want specific keys
want_keys = [6, 7, 8]
new_dict = dict(filter(lambda x: x[0] in want_keys, my_dict.items()))


# Or use logic – I want greater than 6
new_dict_2 = dict(filter(lambda x: x[0] >6, my_dict.items()))

You can get unnecessarily fancy with partial functions and operators if you wish too:

from functools import partial
from operator import is_, is_not, gt, contains


condition = partial(contains, want_keys)
condition = partial(gt, 6)

# use one of the conditions
dict(filter(lambda x: condition(x[0]), my_dict.items()))

Filter dict to contain only certain keys?

23 Answers23

Linked

Related