2

I am using a dictionary to store a bunch of counters where each counter is counting the occurence of a file type (.wav, .mp3, etc).

filetypecounter = {}

When I come across a certain file type I want to be able to increase a counter in a pythonic way. So I am thinking...

filetypecounter[filetype] +=1

However, if the filetype is not in the dictionary I want to instantiate it to 1. So my logic is if filetype counter is there, add 1 to the counter value, else set it to one.

if filetype not in filetypecounter:
    filetypecounter[filetype] = 1
else: 
    filetypecounter[filetype] +=1

Is there a more pythonic way?

More Than Five
  • 9,959
  • 21
  • 77
  • 127

7 Answers7

3
from collections import defaultdict

filetypecounter = defaultdict(int)
filetypecounter[filetype] += 1

or

from collections import Counter

filetypecounter = Counter()
filetypecounter.update([filetype])

For info, if you must use a dict, your solution (checking if the key is present) is a reasonable one. Perhaps a more 'pythonic' solution might be:

filetypecounter = {}
filetypecounter[filetype] = filetypecounter.get(filetype, 0) + 1

Really though, this and other suggestions are just variations o the same theme. I'd use the Counter.

Rob Cowie
  • 22,259
  • 6
  • 62
  • 56
2

It looks like what you want is collections.defaultdict, or collections.Counter for Python 2.7 and up.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
2

Well using collections.Counter is well covered in this group of answers, but that may not be the fastest choice.

One older way is this:

>>> d={}
>>> for ext in ('.mp3','.mp3','.m4a','.mp3','.wav','.m4a'):
...    d[ext]=d.setdefault(ext,0)+1
... 
>>> d
{'.mp3': 3, '.wav': 1, '.m4a': 2}

That is not the fastest either, but it is faster than collections.Counter

There are benchmarks of these methods and either defaultdict, try/except or your original method are the fastest.

I have reproduced (and expanded) the benchmark here:

import urllib2
import timeit

response = urllib2.urlopen('http://pastebin.com/raw.php?i=7p3uycAz')
hamlet = response.read().replace('\r\n','\n')
LETTERS = [w for w in hamlet]
WORDS = hamlet.split(' ')
fmt='{:>20}: {:7.4} seconds for {} loops'
n=100
print
t = timeit.Timer(stmt="""
        counter = defaultdict(int)
        for k in LETTERS:
            counter[k] += 1 
        """,
        setup="from collections import defaultdict; from __main__ import LETTERS")

print fmt.format("defaultdict letters",t.timeit(n),n)
t = timeit.Timer(stmt="""
        counter = defaultdict(int)
        for k in WORDS:
            counter[k] += 1 
        """,
        setup="from collections import defaultdict; from __main__ import WORDS")

print fmt.format("defaultdict words",t.timeit(n),n)
print

# setdefault
t = timeit.Timer(stmt="""
        counter = {}
        for k in LETTERS:
            counter[k]=counter.setdefault(k, 0)+1
        """,
        setup="from __main__ import LETTERS")
print fmt.format("setdefault letters",t.timeit(n),n)
t = timeit.Timer(stmt="""
        counter = {}
        for k in WORDS:
            counter[k]=counter.setdefault(k, 0)+1
        """,
        setup="from __main__ import WORDS")
print fmt.format("setdefault words",t.timeit(n),n)
print

# Counter
t = timeit.Timer(stmt="c = Counter(LETTERS)",
        setup="from collections import Counter; from __main__ import LETTERS")

print fmt.format("Counter letters",t.timeit(n),n)
t = timeit.Timer(stmt="c = Counter(WORDS)",
        setup="from collections import Counter; from __main__ import WORDS")
print fmt.format("Counter words",t.timeit(n),n)
print

# in
t = timeit.Timer(stmt="""
        counter = {}
        for k in LETTERS:
            if k in counter: counter[k]+=1
            else: counter[k]=1   
        """,
        setup="from __main__ import LETTERS")
print fmt.format("'in' letters",t.timeit(n),n)
t = timeit.Timer(stmt="""
        counter = {}
        for k in WORDS:
            if k in counter: counter[k]+=1
            else: counter[k]=1   
        """,
        setup="from __main__ import WORDS")
print fmt.format("'in' words",t.timeit(n),n)
print

# try
t = timeit.Timer(stmt="""
        counter = {}
        for k in LETTERS:
            try:
                counter[k]+=1
            except KeyError:
                counter[k]=1     
        """,
        setup="from __main__ import LETTERS")
print fmt.format("try letters",t.timeit(n),n)
t = timeit.Timer(stmt="""
        counter = {}
        for k in WORDS:
            try:
                counter[k]+=1
            except KeyError:
                counter[k]=1             """,
        setup="from __main__ import WORDS")
print fmt.format("try words",t.timeit(n),n)
print "\n{:,} letters and {:,} words".format(len(list(LETTERS)),len(list(WORDS)))

Prints:

 defaultdict letters:   3.001 seconds for 100 loops
   defaultdict words:  0.8495 seconds for 100 loops

  setdefault letters:   4.839 seconds for 100 loops
    setdefault words:   0.946 seconds for 100 loops

     Counter letters:   7.335 seconds for 100 loops
       Counter words:   1.298 seconds for 100 loops

        'in' letters:   4.013 seconds for 100 loops
          'in' words:  0.7275 seconds for 100 loops

         try letters:   3.389 seconds for 100 loops
           try words:   1.571 seconds for 100 loops

175,176 letters and 26,630 words

Personally I was surprised that try except is one of the fastest ways to do this. Who knew...

dawg
  • 98,345
  • 23
  • 131
  • 206
1

An alternative method would be a try / except clause:

try: 
    filetypecounter[filetype] += 1
except KeyError:
    filetypecounter[filetype] = 1

If you have fewer filetypes than files, this method is more efficient, because instead of unnecessarily checking if the filetype is in filetypecounter, you assume that this is the case and only create a new entry in filetypecounter when it is not.

Edit: Added KeyError in response to comment from @delnan.

ASGM
  • 11,051
  • 1
  • 32
  • 53
  • 1
    Bare `except:` is bad because it catches more exceptions than sensible, hiding bugs (in this case, an example of a bug that might be hidden is a typo in `filetypecounter` or `filetype`, or the dictionary containing values that can't be incremented). –  Feb 23 '13 at 20:27
  • Thanks @delnan, answer duly modified. – ASGM Feb 23 '13 at 20:31
1

I guess all you need is counter module.

aemdy
  • 3,702
  • 6
  • 34
  • 49
1

The collections.Counter class does exactly what you (really) need.

martineau
  • 119,623
  • 25
  • 170
  • 301
0

I think you want defaultdict:

from collections import defaultdict

d = defaultdict(lambda: 0)
d['foo'] += 1
# d['foo'] is now 1

Another idea is to use dict.setdefault:

d = {}
d.setdefault('foo', 0)  # won't override if 'foo' is already in d
d['foo'] += 1
Markus Unterwaditzer
  • 7,992
  • 32
  • 60