Counting the number of distinct keys in a dictionary in Python

Question

I have a a dictionary mapping keywords to the repetition of the keyword, but I only want a list of distinct words so I wanted to count the number of keywords. Is there a way to count the number of keywords or is there another way I should look for distinct words?

The keys in a Python dictionary are already distinct from each other. You can't have the exact some keyword as a key twice in a Python dictionary. Therefore, counting the number of keys is the same as counting the number of distinct keys. — Flimm, Apr 28 '21 at 08:57

YOU · Accepted Answer · 2010-02-07T05:11:56.353

547

len(yourdict.keys())

or just

len(yourdict)

If you like to count unique words in the file, you could just use set and do like

len(set(open(yourdictfile).read().split()))

edited Feb 07 '10 at 05:11

answered Feb 06 '10 at 07:41

YOU

120,166
34
186
219

5

I know this post is old, but I was curious. Is this the fastest method? Or: is it *a* reasonably fast method for large dictionaries? – john_science Mar 01 '13 at 03:40
8

Both ```len(yourdict.keys())``` and ```len(yourdict)``` are O(1). The latter is slightly faster. See my tests below. – Chih-Hsuan Yen Apr 17 '16 at 10:07
5

I'd like to note that you can also go for the values (I know the question didn't ask it) with `len(yourdict.values())` – ntk4 Sep 23 '16 at 05:49

score 37 · Answer 2 · answered Feb 06 '10 at 07:40

37

The number of distinct words (i.e. count of entries in the dictionary) can be found using the len() function.

> a = {'foo':42, 'bar':69}
> len(a)
2

To get all the distinct words (i.e. the keys), use the .keys() method.

> list(a.keys())
['foo', 'bar']

answered Feb 06 '10 at 07:40

kennytm

510,854
105
1,084
1,005

score 12 · Answer 3 · answered Mar 07 '19 at 14:25

Calling len() directly on your dictionary works, and is faster than building an iterator, d.keys(), and calling len() on it, but the speed of either will negligible in comparison to whatever else your program is doing.

d = {x: x**2 for x in range(1000)}

len(d)
# 1000

len(d.keys())
# 1000

%timeit len(d)
# 41.9 ns ± 0.244 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit len(d.keys())
# 83.3 ns ± 0.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

score 3 · Answer 4 · edited Jul 10 '23 at 13:21

3

If the question is about counting the number of keywords then I would recommend something like:

def countoccurrences(store, value):
    try:
        store[value] = store[value] + 1
    except KeyError as e:
        store[value] = 1
    return

In the main function, have something that loops through the data and pass the values to the countoccurrences function:

if __name__ == "__main__":
    store = {}
    list = ('a', 'a', 'b', 'c', 'c')
    for data in list:
        countoccurrences(store, data)
    for k, v in store.iteritems():
        print "Key " + k + " has occurred "  + str(v) + " times"

The code outputs

Key a has occurred 2 times
Key c has occurred 2 times
Key b has occurred 1 times

edited Jul 10 '23 at 13:21

Peter Mortensen

30,738
21
105
131

answered Jan 25 '18 at 22:00

David

887
8
7

2

[PEP 8 naming conventions](https://www.python.org/dev/peps/pep-0008/#function-and-variable-names) dictate that `countoccurrences()` should instead be `count_occurrences()`. Also, if you import [`collections.Counter`](https://docs.python.org/3/library/collections.html#collections.Counter), there's a much better way to do it: `from collections import Counter; store = Counter(); for data in list: store[list] += 1`. – Graham Aug 02 '18 at 20:59

score 0 · Answer 5 · edited Jul 10 '23 at 13:20

Some modifications were made on posted answer UnderWaterKremlin to make it Python 3 proof. A surprising result is below as an answer.

System specifications:

Python = 3.7.4,
Conda = 4.8.0
3.6 GHz, 8 cores, 16 GB.

import timeit

d = {x: x**2 for x in range(1000)}
#print (d)
print (len(d))
# 1000

print (len(d.keys()))
# 1000

print (timeit.timeit('len({x: x**2 for x in range(1000)})', number=100000))        # 1

print (timeit.timeit('len({x: x**2 for x in range(1000)}.keys())', number=100000)) # 2

Result:

= 37.0100378
= 37.002148899999995

So it seems that len(d.keys()) is currently faster than just using len().

score -3 · Answer 6 · edited Feb 26 '21 at 11:07

-3

In order to count the number of keywords in a dictionary:

def dict_finder(dict_finders):
    x=input("Enter the thing you want to find: ")
    if x in dict_finders:
        print("Element found")
    else:
        print("Nothing found:")

edited Feb 26 '21 at 11:07

msoler

2,930
2
18
30

answered Feb 26 '21 at 05:50

Pranav Rajesh

9
1

Counting the number of distinct keys in a dictionary in Python

6 Answers6

Linked

Related