Creating a Letter a Histogram

Question

So I want to create a histogram. Here is my code:

def histogram(s):
    d = dict()
    for c in s:
        if c not in d:
            d[c] = 1
        else:
            d[c] += 1
    return d

def print_hist(h):
    for c in h:
        print c, h[c]

It give me this:

>>> h = histogram('parrot')
>>> print_hist(h)
a 1
p 1
r 2
t 1
o 1

But I want this:

a: 1
o: 1
p: 1
r: 2
t: 1

So how can I get my histogram in alphabetical order, be case sensitive (so "a" and "A" are the same), and list the whole alphabet (so letters that are not in the string just get a zero)?

possible duplicate of [python dictionary sort by key](http://stackoverflow.com/questions/9001509/python-dictionary-sort-by-key) — Caramiriel, Apr 02 '14 at 18:03
`"a" and "A" are the same` that's not what case sensitive mean — njzk2, Apr 02 '14 at 18:11

Two-Bit Alchemist · Answer 1 · 2014-04-02T18:26:14.337

3

Just use collections.Counter for this, unless you really want your own:

>>> import collections
>>> c = collections.Counter('parrot')
>>> sorted(c.items(), key=lambda c: c[0])
[('a', 1), ('o', 1), ('p', 1), ('r', 2), ('t', 1)]

EDIT: As commenters pointed out, your last sentence indicates you want data on all the letters of the alphabet that do not occur in your word. Counter is good for this also since, as the docs indicate:

Counter objects have a dictionary interface except that they return a zero count for missing items instead of raising a KeyError.

So you can just iterate through something like string.ascii_lowercase:

>>> import string
>>> for letter in string.ascii_lowercase:
...   print('{}: {}'.format(letter, c[letter]))
... 
a: 1
b: 0
c: 0
d: 0
e: 0
f: 0
g: 0
h: 0
i: 0
j: 0
k: 0
l: 0
m: 0
n: 0
o: 1
p: 1
q: 0
r: 2
s: 0
t: 1
u: 0
v: 0
w: 0
x: 0
y: 0
z: 0

Finally, rather than implementing something complicated to merge the results of upper- and lowercase letters, just normalize your input first:

c = collections.Counter('PaRrOt'.lower())

edited Apr 02 '14 at 18:26

answered Apr 02 '14 at 18:03

Two-Bit Alchemist

17,966
6
47
82

Doesn't address the missing letters. – Scott Hunter Apr 02 '14 at 18:03
Ahh, you're right, I missed the requirement in the last sentence of the question, and just went by the posted expected output! – Two-Bit Alchemist Apr 02 '14 at 18:04
Don't feel bad: so has every other answer so far! – Scott Hunter Apr 02 '14 at 18:05
from there you only need to enumerate `string.ascii_lowercase` and print the value for that key – njzk2 Apr 02 '14 at 18:07
@njzk2 Yes, I got called away for a moment, but finally got back to updating my answer to actually answer the question. Thank you guys for the help identifying my misreading! – Two-Bit Alchemist Apr 02 '14 at 18:27

score 3 · Answer 2 · answered Apr 02 '14 at 18:05

3

Use an ordered dictionary which store keys in the order they were put in.

from collections import OrderedDict
import string

def count(s):
    histogram = OrderedDict((c,0) for c in string.lowercase)
    for c in s:
        if c in string.letters:
            histogram[c.lower()] += 1
    return histogram

for letter, c in count('parrot').iteritems():
    print '{}:{}'.format(letter, c)

Result:

a:1
b:0
c:0
d:0
e:0
f:0
g:0
h:0
i:0
j:0
k:0
l:0
m:0
n:0
o:1
p:1
q:0
r:2
s:0
t:1
u:0
v:0
w:0
x:0
y:0
z:0

answered Apr 02 '14 at 18:05

Samy Arous

6,794
13
20

I don't see the use of ordering the data when the order is invariant and can be used at display time. – njzk2 Apr 03 '14 at 13:37
You are absolutely right. It's the reminiscent of the first version of the algorithm which used ony a subset of the alphabet. There is no additional cost though :) and it also a more general approach when the alphabet is not known on compile time. – Samy Arous Apr 03 '14 at 13:50
i'm now actually wondering what are the basic operations complexity on an ordereddict compared to a basic dict – njzk2 Apr 03 '14 at 16:13

score 2 · Answer 3 · answered Apr 02 '14 at 18:10

A trivial answer would be:

import string
for letter in string.ascii_lowercase:
    print letter, ': ', h.lower().count(letter)

(highly inefficient as you go through the string 26 times)

Can also use a Counter

from collections import Counter
import string
cnt = Counter(h.lower())
for letter in string.ascii_lowercase:
    print letter, ': ', cnt[letter]

Quite neater.

PyNEwbie · Answer 4 · 2014-04-02T18:35:07.940

If you want it ordered then you are going to have to use an ordereddictionary

You also are going to need to order the letters before you add them to the dictionary It is not clear to me I think you want a case insensitive result so we need to get all letters in one case

from collections import OrderedDict as od
import string

def histogram(s):

first we need to create the dictionary that has all of the lower case letters we imported string which will provide us a list but I think it is all lowercase including unicode so we need to only use the first 26 in string.lowercase

    d = od()
    for each_letter in string.lowercase[0:26]:
       d[each_letter] = 0

Once the dictionary is created then we just need to iterate through the word after it has been lowercased. Please note that this will blow up with any word that has a number or a space. You may or may not want to test or add numbers and spaces to your dictionary. One way to keep it from blowing up is to try to add a value. If the value is not in the dictionary just ignore it.

    for c in s.lower():
       try:
           d[c] += 1
       except ValueError:
           pass
    return d

It is not clear that he needs missing letters not according to what I see in the q — PyNEwbie, Apr 02 '14 at 18:05
Review your code, you have a syntax error, and a bad import. You can also use list(s) instead of a list comprehension and d.get(c, 0) instead of your if condition. — Samy Arous, Apr 02 '14 at 18:12

Jasper · Answer 5 · 2014-04-02T18:15:41.417

0

If you want to list the whole (latin only) alphabet anyway, you could use a list of length 26:

hist = [0] * 26
for c in s.lower():
  hist[orc(c) - ord('a')] += 1

To get the desired output:

for x in range(26):
  print chr(x), ":", hist[x]

edited Apr 02 '14 at 18:15

answered Apr 02 '14 at 18:05

Jasper

3,939
1
18
35

How would this be printed to produce the desired output? – Scott Hunter Apr 02 '14 at 18:09
2

This will work using ascii letters, which might not always be the case and will certainly not work with non latin languages. – Samy Arous Apr 02 '14 at 18:14
-1. Why are we playing with ASCII code points and writing unicode unfriendly code when there are dozens of more readable ways to do this readily available? – Two-Bit Alchemist Apr 02 '14 at 18:29
If OP is talking about "the whole Alphabet", I think this is a legit approach and I wanted to point out the possibility to go without a dict. You use ascii_lowercase in your own answer as well btw... – Jasper Apr 02 '14 at 18:33
There's nothing wrong with using `string.ascii_lowercase` to loop over, well, `string.ascii_lowercase`. I'm voting against your solution using `chr`/`ord` because it's needless obfuscation of what you are trying to do. If this were C, sure, but this is Python. We don't have to play with the ASCII table to sort things (at least not explicitly). – Two-Bit Alchemist Apr 02 '14 at 18:40

Hemant · Answer 6 · 2014-04-02T18:35:04.053

0

Check this function for your output

    def print_hist(h):
     for c in sorted(h):
      print c, h[c]

edited Apr 02 '14 at 18:35

answered Apr 02 '14 at 18:29

Hemant

181
3
14

Same problem as Scott Hunter was pointing out for everyone else. Read the last sentence of OP's post. – Two-Bit Alchemist Apr 02 '14 at 18:35

Creating a Letter a Histogram

6 Answers6

Linked