0

So I want to create a histogram. Here is my code:

def histogram(s):
    d = dict()
    for c in s:
        if c not in d:
            d[c] = 1
        else:
            d[c] += 1
    return d

def print_hist(h):
    for c in h:
        print c, h[c]

It give me this:

>>> h = histogram('parrot')
>>> print_hist(h)
a 1
p 1
r 2
t 1
o 1

But I want this:

a: 1
o: 1
p: 1
r: 2
t: 1

So how can I get my histogram in alphabetical order, be case sensitive (so "a" and "A" are the same), and list the whole alphabet (so letters that are not in the string just get a zero)?

Kara
  • 6,115
  • 16
  • 50
  • 57
user3490645
  • 43
  • 1
  • 5

6 Answers6

3

Just use collections.Counter for this, unless you really want your own:

>>> import collections
>>> c = collections.Counter('parrot')
>>> sorted(c.items(), key=lambda c: c[0])
[('a', 1), ('o', 1), ('p', 1), ('r', 2), ('t', 1)]

EDIT: As commenters pointed out, your last sentence indicates you want data on all the letters of the alphabet that do not occur in your word. Counter is good for this also since, as the docs indicate:

Counter objects have a dictionary interface except that they return a zero count for missing items instead of raising a KeyError.

So you can just iterate through something like string.ascii_lowercase:

>>> import string
>>> for letter in string.ascii_lowercase:
...   print('{}: {}'.format(letter, c[letter]))
... 
a: 1
b: 0
c: 0
d: 0
e: 0
f: 0
g: 0
h: 0
i: 0
j: 0
k: 0
l: 0
m: 0
n: 0
o: 1
p: 1
q: 0
r: 2
s: 0
t: 1
u: 0
v: 0
w: 0
x: 0
y: 0
z: 0

Finally, rather than implementing something complicated to merge the results of upper- and lowercase letters, just normalize your input first:

c = collections.Counter('PaRrOt'.lower())
Two-Bit Alchemist
  • 17,966
  • 6
  • 47
  • 82
3

Use an ordered dictionary which store keys in the order they were put in.

from collections import OrderedDict
import string

def count(s):
    histogram = OrderedDict((c,0) for c in string.lowercase)
    for c in s:
        if c in string.letters:
            histogram[c.lower()] += 1
    return histogram

for letter, c in count('parrot').iteritems():
    print '{}:{}'.format(letter, c)

Result:

a:1
b:0
c:0
d:0
e:0
f:0
g:0
h:0
i:0
j:0
k:0
l:0
m:0
n:0
o:1
p:1
q:0
r:2
s:0
t:1
u:0
v:0
w:0
x:0
y:0
z:0
Samy Arous
  • 6,794
  • 13
  • 20
  • I don't see the use of ordering the data when the order is invariant and can be used at display time. – njzk2 Apr 03 '14 at 13:37
  • You are absolutely right. It's the reminiscent of the first version of the algorithm which used ony a subset of the alphabet. There is no additional cost though :) and it also a more general approach when the alphabet is not known on compile time. – Samy Arous Apr 03 '14 at 13:50
  • i'm now actually wondering what are the basic operations complexity on an ordereddict compared to a basic dict – njzk2 Apr 03 '14 at 16:13
2

A trivial answer would be:

import string
for letter in string.ascii_lowercase:
    print letter, ': ', h.lower().count(letter)

(highly inefficient as you go through the string 26 times)

Can also use a Counter

from collections import Counter
import string
cnt = Counter(h.lower())
for letter in string.ascii_lowercase:
    print letter, ': ', cnt[letter]

Quite neater.

njzk2
  • 38,969
  • 7
  • 69
  • 107
1

If you want it ordered then you are going to have to use an ordereddictionary

You also are going to need to order the letters before you add them to the dictionary It is not clear to me I think you want a case insensitive result so we need to get all letters in one case

from collections import OrderedDict as od
import string

def histogram(s):

first we need to create the dictionary that has all of the lower case letters we imported string which will provide us a list but I think it is all lowercase including unicode so we need to only use the first 26 in string.lowercase

    d = od()
    for each_letter in string.lowercase[0:26]:
       d[each_letter] = 0

Once the dictionary is created then we just need to iterate through the word after it has been lowercased. Please note that this will blow up with any word that has a number or a space. You may or may not want to test or add numbers and spaces to your dictionary. One way to keep it from blowing up is to try to add a value. If the value is not in the dictionary just ignore it.

    for c in s.lower():
       try:
           d[c] += 1
       except ValueError:
           pass
    return d
PyNEwbie
  • 4,882
  • 4
  • 38
  • 86
0

If you want to list the whole (latin only) alphabet anyway, you could use a list of length 26:

hist = [0] * 26
for c in s.lower():
  hist[orc(c) - ord('a')] += 1

To get the desired output:

for x in range(26):
  print chr(x), ":", hist[x]
Jasper
  • 3,939
  • 1
  • 18
  • 35
  • How would this be printed to produce the desired output? – Scott Hunter Apr 02 '14 at 18:09
  • 2
    This will work using ascii letters, which might not always be the case and will certainly not work with non latin languages. – Samy Arous Apr 02 '14 at 18:14
  • -1. Why are we playing with ASCII code points and writing unicode unfriendly code when there are dozens of more readable ways to do this readily available? – Two-Bit Alchemist Apr 02 '14 at 18:29
  • If OP is talking about "the whole Alphabet", I think this is a legit approach and I wanted to point out the possibility to go without a dict. You use ascii_lowercase in your own answer as well btw... – Jasper Apr 02 '14 at 18:33
  • There's nothing wrong with using `string.ascii_lowercase` to loop over, well, `string.ascii_lowercase`. I'm voting against your solution using `chr`/`ord` because it's needless obfuscation of what you are trying to do. If this were C, sure, but this is Python. We don't have to play with the ASCII table to sort things (at least not explicitly). – Two-Bit Alchemist Apr 02 '14 at 18:40
0

Check this function for your output

    def print_hist(h):
     for c in sorted(h):
      print c, h[c]
Hemant
  • 181
  • 3
  • 14