1

For example, given list of str: ['a', 'b', 'a', 'a', 'b'], I want to get the counts of distinct string {'a' : 3, 'b' : 2}.

the naive method is like following:

lst = ['a', 'b', 'a', 'a', 'b']
counts = dict()

for w in lst:
    counts[w] = counts.get(w, 0) + 1

However, it needs twice Hash Table queries. In fact, when we firstly called the get method, we have already known the bucket location. In principle, we can modify the bucket value in-place without searching the bucket location twice. I know in Java we can use map.merge to get this optimization: https://stackoverflow.com/a/33711386/10969942

How to do it in Python?

maplemaple
  • 1,297
  • 6
  • 24

1 Answers1

1

This is no such method in Python. Whether visible or not, at least under the covers the table lookup will be done twice. But, as the answer you linked to said about Java, nobody much cares - hash table lookup is typically fast, and since you just looked up a key all the info to look it up again is likely sitting in L1 cache.

Two ways of spelling your task that are more idiomatic, but despite that the double-lookup isn't directly visible in either, it still occurs under covers:

>>> lst = ['a', 'b', 'a', 'a', 'b']
>>> from collections import defaultdict
>>> counts = defaultdict(int) # default value is int(); i.e., 0
>>> for w in lst:
...     counts[w] += 1
>>> counts
defaultdict(<class 'int'>, {'a': 3, 'b': 2})

and

>>> from collections import Counter
>>> Counter(lst)
Counter({'a': 3, 'b': 2})
Tim Peters
  • 67,464
  • 13
  • 126
  • 132