0

I often find myself writing this pattern for accumulating counts about a thing in a dictionary. In this case, I'm gathering up host hardware information for a datacenter application, and the pattern is the same I have often used.

Today, I was wondering if there is a better way (more Pythonic) to construct this pattern?

I did take a look around S.O. the most common result is using append on a list to add things which do not exist, but that does not address how to do an accumulator in a dictionary where the incoming key/value may or may not exist in the dictionary already.

hardware_types = []
for host in hosts:
    hardware_type = hosts[host]['hardware_type']
    if hardware_type in hardware_types:
        hardware_types[hardware_type] += 1
    else:
        hardware_types[hardware_type] = 1

Thanks, Bob

Bob Smith
  • 145
  • 5
  • 2
    `collections.Counter(h['hardware_type'] for h in hosts.values())` – deceze Nov 26 '19 at 05:46
  • @deceze: Or pushing all the work to C: `collections.Counter(map(operator.itemgetter('hardware_type'), hosts.values()))` (I'd normally import `itemgetter` as an unqualified name to reduce verbosity). Removes per-item bytecode interpreter overhead (at the expense of slightly higher fixed overhead, making it faster for long inputs, slower for very short ones). – ShadowRanger Nov 26 '19 at 21:11
  • This basically works to let me at least drop the if-else block. I can't quite reduce to one line as the hosts array is 2-dimensional, but I can get from 8 lines to 3, so that's a win. The linked post does not quite answer the question because I need my results in a dict. The counter does a fine job for the output of `Counter({'ORION': 13, None: 1})` for the hardware types. – Bob Smith Nov 26 '19 at 21:51
  • You can convert a `Counter` to a dict just with `dict(c)`… – deceze Nov 27 '19 at 05:47

1 Answers1

2

I often use defaultdict for this kind of scenario. On declaration you specify what the default value is and it will be generated when you refer to it the first time. In this case you'd want a 0, and can use the int type as a constructor:

from collections import defaultdict

hardware_types = defaultdict(int)
for host in hosts:
    hardware_type = hosts[host]['hardware_type'])
    hardware_types[hardware_type] += 1

It can be nested, include dictionaries via lambdas, etc. It helps keep code a bit lighter by avoiding checking if the key exists, however be careful with assumptions as in case the key should exist but does not due to a bug, it will be created and won't raise an exception.

  • 1
    Note: `defaultdict` is a broader solution (for cases where the default isn't an `int`), but if you're just counting elements, especially if said elements can come from an iterable of things to count, you definitely want `collections.Counter`; for modern CPython, it's got a C-accelerated counting helper that outperforms manual loops with `defaultdict(int)`. `Counter` also avoids creating entries in the `dict` when you perform a lookup of a non-existent key (returning `0` but leaving contents unchanged), where `defaultdict(int)` will store each key. – ShadowRanger Nov 26 '19 at 21:11