Most Common letter in a string

Question

Completing an exercise to find the most common letter in a string, excluding punctuation and the result should be in lowercase. So in the example "HHHHello World!!!!!!!!!!" the result should be "h".

What I have so far is:

text=input('Insert String: ')
def mwl(text):
    import string
    import collections
    for p in text:
        p.lower()
    for l in string.punctuation:
        for x in text:
            if x==l:
                text.replace(x,'')
    collist=collections.Counter(text).most_common(1)
    print(collist[0][0])

mwl(text)

I would appreciate your help to understand why:

The case is not remaining changed to lower in text
The punctuation is not being permanently removed from the text string

It should be `text = text.replace(x,'')` and p.lower() won't change the string as well. — vaultah, Dec 29 '14 at 18:25
Strings are *immutable*, i.e. cannot be changed in-place, so the string methods return *a new string object*. You need to assign that new object back to the appropriate name, e.g. `text = text.lower()`. — jonrsharpe, Dec 29 '14 at 18:25

score 1 · Answer 1 · edited May 23 '17 at 11:50

1

There are several issues:

Strings are immutable. This means that functions like lower() and replace() return the results and leave the original string as is. You need to assign that return value somewhere.
lower() can operate on the entire string: text = text.lower().

For some ideas on how to remove punctuation characters from a string, see Best way to strip punctuation from a string in Python

edited May 23 '17 at 11:50

Community

1
1

answered Dec 29 '14 at 18:27

NPE

486,780
108
951
1,012

thanks NPE. will get there. the set(punctuation) solution by user below seems easiest. cheers – Dom13 Dec 29 '14 at 18:57

score 1 · Answer 2 · answered Dec 29 '14 at 18:56

1

you can try this:

>>> import re
>>> from collections import Counter
>>> my_string = "HHHHello World!!!!!!!!!!"
>>> Counter("".join(re.findall("[a-z]+",my_string.lower()))).most_common(1)
[('h', 4)]

answered Dec 29 '14 at 18:56

Hackaholic

19,069
5
54
72

Padraic Cunningham · Accepted Answer · 2014-12-29T19:05:45.290

0

text = input('Insert String: ')

from string import punctuation
from collections import Counter
def mwl(text):
    st = set(punctuation)
    # remove all punctuation and make every letter lowercase
    filtered = (ch.lower() for ch in text if ch not in st)
    # make counter dict from remaining letters and return the most common
    return Counter(filtered).most_common()[0][0]

Or use str.translate to remove the punctuation :

from string import punctuation
from collections import Counter
def mwl(text):
    text = text.lower().translate(str.maketrans(" "*len(punctuation),punctuation))
    return Counter(text).most_common()[0][0]

Using your own code you need to reassign text to the updated string:

def mwl(text):
    import string
    import collections
    text = text.lower() 
    for l in string.punctuation:
        for x in text:
            if x == l:
                text = text.replace(x,'')
    collist=collections.Counter(text).most_common(1)
    print(collist[0][0])

Also instead of looping over the text in your code you could just use in:

for l in string.punctuation:
     if l in text:
        text = text.replace(l,'')

edited Dec 29 '14 at 19:05

answered Dec 29 '14 at 18:27

Padraic Cunningham

176,452
29
245
321

thanks @Padraic. I have also added 'alpha=''.join(sorted(filtered)) alpha=alpha.replace("","")' - is this the best? – Dom13 Dec 29 '14 at 18:59
@Dom13, if you are using my `filtered` you don't need to replace, there will be no punctuation left,`if ch not in st` removes the punctuation so all you have to do is pass filtered to Counter. – Padraic Cunningham Dec 29 '14 at 19:03
should have clarified, was trying to strip all white space. Eventually figured 'text=''.join(text.split())'. thanks again. – Dom13 Dec 29 '14 at 20:00
ah ok, I see. No problem – Padraic Cunningham Dec 29 '14 at 20:01
your `.translate()`-based solution is incorrect (it returns `'!'` instead of `'h'`). Python 3 has different interface here compared to Python 2. Though your solution fails on Python 2 too. – jfs Jan 11 '15 at 12:12
`string.punctuation` doesn't take into account non-ascii punctuation. You should use `.casefold()` instead of `.lower()` for proper caseless comparison. See [my answer](http://stackoverflow.com/a/27886782/4279). `most_common(1)` can be much more efficient than `most_common()[0]` (`min()` vs. `sorted()`). – jfs Jan 11 '15 at 12:14

score 0 · Answer 4 · answered Dec 29 '14 at 18:44

First big issues is you never actually assign anything.

 p.lower()

just returns a lowercase version of p. It does not set p to the lowercase version. Should be

p = p.lower()

Same with the text.replace(x,''). It should be text = text.replace(x,'')

score 0 · Answer 5 · answered Dec 29 '14 at 18:45

0

You could do:

>>> from collections import Counter
>>> from string import ascii_letters
>>> tgt="HHHHello World!!!!!!!!!!" 
>>> Counter(c.lower() for c in tgt if c in ascii_letters).most_common(1)
[('h', 4)]

answered Dec 29 '14 at 18:45

dawg

98,345
23
131
206

score 0 · Answer 6 · edited May 23 '17 at 11:57

If input is ascii-only then you could use bytes.translate() to convert it to lowercase and remove punctuation:

#!/usr/bin/env python3
from string import ascii_uppercase, ascii_lowercase, punctuation

table = b''.maketrans(ascii_uppercase.encode(), ascii_lowercase.encode())
def normalize_ascii(text, todelete=punctuation.encode()):
    return text.encode('ascii', 'strict').translate(table, todelete)

s = "HHHHello World!!!!!!!!!!"

count = [0]*256 # number of all possible bytes
for b in normalize_ascii(s): count[b] += 1 # count bytes
# print the most common byte
print(chr(max(range(len(count)), key=count.__getitem__)))

If you want to count letters in a non-ascii Unicode text then you could use .casefold() method (proper caseless comparison) and remove_punctuation() function:

#!/usr/bin/env python3
from collections import Counter
import regex # $ pip install regex

def remove_punctuation(text):
    return regex.sub(r"\p{P}+", "", text)

s = "HHHHello World!!!!!!!!!!"
no_punct = remove_punctuation(s)
characters = (c.casefold() for c in regex.findall(r'\X', no_punct))
print(Counter(characters).most_common(1)[0][0])

r'\X' regex is used to count user-perceived characters instead of mere Unicode codepoints.

Most Common letter in a string

6 Answers6