I have a a dictionary mapping keywords to the repetition of the keyword, but I only want a list of distinct words so I wanted to count the number of keywords. Is there a way to count the number of keywords or is there another way I should look for distinct words?
-
1The keys in a Python dictionary are already distinct from each other. You can't have the exact some keyword as a key twice in a Python dictionary. Therefore, counting the number of keys is the same as counting the number of distinct keys. – Flimm Apr 28 '21 at 08:57
6 Answers
len(yourdict.keys())
or just
len(yourdict)
If you like to count unique words in the file, you could just use set
and do like
len(set(open(yourdictfile).read().split()))

- 120,166
- 34
- 186
- 219
-
5I know this post is old, but I was curious. Is this the fastest method? Or: is it *a* reasonably fast method for large dictionaries? – john_science Mar 01 '13 at 03:40
-
8Both ```len(yourdict.keys())``` and ```len(yourdict)``` are O(1). The latter is slightly faster. See my tests below. – Chih-Hsuan Yen Apr 17 '16 at 10:07
-
5I'd like to note that you can also go for the values (I know the question didn't ask it) with `len(yourdict.values())` – ntk4 Sep 23 '16 at 05:49
The number of distinct words (i.e. count of entries in the dictionary) can be found using the len()
function.
> a = {'foo':42, 'bar':69}
> len(a)
2
To get all the distinct words (i.e. the keys), use the .keys()
method.
> list(a.keys())
['foo', 'bar']

- 510,854
- 105
- 1,084
- 1,005
Calling len()
directly on your dictionary works, and is faster than building an iterator, d.keys()
, and calling len()
on it, but the speed of either will negligible in comparison to whatever else your program is doing.
d = {x: x**2 for x in range(1000)}
len(d)
# 1000
len(d.keys())
# 1000
%timeit len(d)
# 41.9 ns ± 0.244 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit len(d.keys())
# 83.3 ns ± 0.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

- 343
- 3
- 9
If the question is about counting the number of keywords then I would recommend something like:
def countoccurrences(store, value):
try:
store[value] = store[value] + 1
except KeyError as e:
store[value] = 1
return
In the main function, have something that loops through the data and pass the values to the countoccurrences
function:
if __name__ == "__main__":
store = {}
list = ('a', 'a', 'b', 'c', 'c')
for data in list:
countoccurrences(store, data)
for k, v in store.iteritems():
print "Key " + k + " has occurred " + str(v) + " times"
The code outputs
Key a has occurred 2 times
Key c has occurred 2 times
Key b has occurred 1 times

- 30,738
- 21
- 105
- 131

- 887
- 8
- 7
-
2[PEP 8 naming conventions](https://www.python.org/dev/peps/pep-0008/#function-and-variable-names) dictate that `countoccurrences()` should instead be `count_occurrences()`. Also, if you import [`collections.Counter`](https://docs.python.org/3/library/collections.html#collections.Counter), there's a much better way to do it: `from collections import Counter; store = Counter(); for data in list: store[list] += 1`. – Graham Aug 02 '18 at 20:59
Some modifications were made on posted answer UnderWaterKremlin to make it Python 3 proof. A surprising result is below as an answer.
System specifications:
- Python = 3.7.4,
- Conda = 4.8.0
- 3.6 GHz, 8 cores, 16 GB.
import timeit
d = {x: x**2 for x in range(1000)}
#print (d)
print (len(d))
# 1000
print (len(d.keys()))
# 1000
print (timeit.timeit('len({x: x**2 for x in range(1000)})', number=100000)) # 1
print (timeit.timeit('len({x: x**2 for x in range(1000)}.keys())', number=100000)) # 2
Result:
= 37.0100378
= 37.002148899999995
So it seems that len(d.keys())
is currently faster than just using len()
.

- 30,738
- 21
- 105
- 131

- 3,708
- 8
- 29
- 48
In order to count the number of keywords in a dictionary:
def dict_finder(dict_finders):
x=input("Enter the thing you want to find: ")
if x in dict_finders:
print("Element found")
else:
print("Nothing found:")

- 2,930
- 2
- 18
- 30

- 9
- 1