1

I am looking for the most efficient way to count the number of letters in a list. I need something like

word=[h e l l o]

for i in alphabet:
   for j in word:
      if j==i:
         ## do something

Where alphabet should be the spanish alphabet, that is the english alphabet including the special character 'ñ'.

I have thought about creating a list of pairs in the form of [[a, 0], [b,1], ...] but I suppose there is a more efficient/clean way.

D1X
  • 5,025
  • 5
  • 21
  • 36

2 Answers2

2

It is not actually a dupe as you want to filter to only count characters from a certain set, you can use a Counter dict to do the counting and a set of allowed characters to filter by:

word = ["h", "e", "l", "l", "o"]

from collections import Counter
from string import ascii_lowercase

# create a set of the characters you want to count.
allowed = set(ascii_lowercase + 'ñ')

# use a Counter dict to get the counts, only counting chars that are in the allowed set.
counts = Counter(s for s in word if s in allowed)

If you actually just want the total sum:

total = sum(s in allowed for s in word)

Or using a functional approach:

total = sum(1 for _ in filter(allowed.__contains__, word))

Using filter is going to be a bit faster for any approach:

In [31]: from collections import Counter
    ...: from string import ascii_lowercase, digits
    ...: from random import choice
    ...: 

In [32]: chars = [choice(digits+ascii_lowercase+'ñ') for _ in range(100000)]

In [33]: timeit Counter(s for s in chars if s in allowed)

100 loops, best of 3: 36.8 ms per loop


In [34]: timeit Counter(filter(allowed.__contains__, chars))
10 loops, best of 3: 31.7 ms per loop

In [35]: timeit sum(s in allowed for s in chars)
10 loops, best of 3: 35.4 ms per loop

In [36]: timeit sum(1 for _ in filter(allowed.__contains__, chars))

100 loops, best of 3: 32 ms per loop

If you want a case insensitive match, use ascii_letters and add 'ñÑ':

from string import ascii_letters

allowed = set(ascii_letters+ 'ñÑ')
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • I do not know Spanish. But based on what I got on internet, it has other characters as well apart from just `ñ`. Check [here](http://sites.psu.edu/symbolcodes/languages/psu/spanish/) – Moinuddin Quadri Oct 19 '16 at 11:15
  • @anonymous, *Where alphabet should be the spanish alphabet, that is the **english** alphabet including the special character 'ñ'.* That is exactly what `ascii_lowercase + 'ñ'` is. – Padraic Cunningham Oct 19 '16 at 11:16
  • 1
    @anonymous If you are refering to the characters á é í ó ú ü, I am not considering those. If you are considering characters like 'ch' or 'll', these are no longer characters of the spanish alphabet. – D1X Oct 19 '16 at 11:25
  • This wouldn't work if `word` had capital letters in it - `counts = Counter(s.lower() for s in word if s.lower() in allowed)` would catch this if needed (at least for the English alphabet) – asongtoruin Oct 19 '16 at 11:40
  • @asingtoruin, `map(str.lower, word)` or using `ascii_letters` would do it if that is a requirement. – Padraic Cunningham Oct 19 '16 at 11:45
0

This is pretty easy:

import collections
print collections.Counter("señor")

This prints:

Counter({'s': 1, 'r': 1, 'e': 1, '\xa4': 1, 'o': 1})
unwind
  • 391,730
  • 64
  • 469
  • 606