1

Completing an exercise to find the most common letter in a string, excluding punctuation and the result should be in lowercase. So in the example "HHHHello World!!!!!!!!!!" the result should be "h".

What I have so far is:

text=input('Insert String: ')
def mwl(text):
    import string
    import collections
    for p in text:
        p.lower()
    for l in string.punctuation:
        for x in text:
            if x==l:
                text.replace(x,'')
    collist=collections.Counter(text).most_common(1)
    print(collist[0][0])

mwl(text)

I would appreciate your help to understand why:

  1. The case is not remaining changed to lower in text
  2. The punctuation is not being permanently removed from the text string
dawg
  • 98,345
  • 23
  • 131
  • 206
Dom13
  • 43
  • 7

6 Answers6

1

There are several issues:

  • Strings are immutable. This means that functions like lower() and replace() return the results and leave the original string as is. You need to assign that return value somewhere.

  • lower() can operate on the entire string: text = text.lower().

For some ideas on how to remove punctuation characters from a string, see Best way to strip punctuation from a string in Python

Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • thanks NPE. will get there. the set(punctuation) solution by user below seems easiest. cheers – Dom13 Dec 29 '14 at 18:57
1

you can try this:

>>> import re
>>> from collections import Counter
>>> my_string = "HHHHello World!!!!!!!!!!"
>>> Counter("".join(re.findall("[a-z]+",my_string.lower()))).most_common(1)
[('h', 4)]
Hackaholic
  • 19,069
  • 5
  • 54
  • 72
0
text = input('Insert String: ')

from string import punctuation
from collections import Counter
def mwl(text):
    st = set(punctuation)
    # remove all punctuation and make every letter lowercase
    filtered = (ch.lower() for ch in text if ch not in st)
    # make counter dict from remaining letters and return the most common
    return Counter(filtered).most_common()[0][0]

Or use str.translate to remove the punctuation :

from string import punctuation
from collections import Counter
def mwl(text):
    text = text.lower().translate(str.maketrans(" "*len(punctuation),punctuation))
    return Counter(text).most_common()[0][0]

Using your own code you need to reassign text to the updated string:

def mwl(text):
    import string
    import collections
    text = text.lower() 
    for l in string.punctuation:
        for x in text:
            if x == l:
                text = text.replace(x,'')
    collist=collections.Counter(text).most_common(1)
    print(collist[0][0])

Also instead of looping over the text in your code you could just use in:

for l in string.punctuation:
     if l in text:
        text = text.replace(l,'')
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • thanks @Padraic. I have also added 'alpha=''.join(sorted(filtered)) alpha=alpha.replace("","")' - is this the best? – Dom13 Dec 29 '14 at 18:59
  • @Dom13, if you are using my `filtered` you don't need to replace, there will be no punctuation left,`if ch not in st` removes the punctuation so all you have to do is pass filtered to Counter. – Padraic Cunningham Dec 29 '14 at 19:03
  • should have clarified, was trying to strip all white space. Eventually figured 'text=''.join(text.split())'. thanks again. – Dom13 Dec 29 '14 at 20:00
  • ah ok, I see. No problem – Padraic Cunningham Dec 29 '14 at 20:01
  • your `.translate()`-based solution is incorrect (it returns `'!'` instead of `'h'`). Python 3 has different interface here compared to Python 2. Though your solution fails on Python 2 too. – jfs Jan 11 '15 at 12:12
  • `string.punctuation` doesn't take into account non-ascii punctuation. You should use `.casefold()` instead of `.lower()` for proper caseless comparison. See [my answer](http://stackoverflow.com/a/27886782/4279). `most_common(1)` can be much more efficient than `most_common()[0]` (`min()` vs. `sorted()`). – jfs Jan 11 '15 at 12:14
0

First big issues is you never actually assign anything.

 p.lower() 

just returns a lowercase version of p. It does not set p to the lowercase version. Should be

p = p.lower()

Same with the text.replace(x,''). It should be text = text.replace(x,'')

Jacobr365
  • 846
  • 11
  • 24
0

You could do:

>>> from collections import Counter
>>> from string import ascii_letters
>>> tgt="HHHHello World!!!!!!!!!!" 
>>> Counter(c.lower() for c in tgt if c in ascii_letters).most_common(1)
[('h', 4)]
dawg
  • 98,345
  • 23
  • 131
  • 206
0

If input is ascii-only then you could use bytes.translate() to convert it to lowercase and remove punctuation:

#!/usr/bin/env python3
from string import ascii_uppercase, ascii_lowercase, punctuation

table = b''.maketrans(ascii_uppercase.encode(), ascii_lowercase.encode())
def normalize_ascii(text, todelete=punctuation.encode()):
    return text.encode('ascii', 'strict').translate(table, todelete)

s = "HHHHello World!!!!!!!!!!"

count = [0]*256 # number of all possible bytes
for b in normalize_ascii(s): count[b] += 1 # count bytes
# print the most common byte
print(chr(max(range(len(count)), key=count.__getitem__)))

If you want to count letters in a non-ascii Unicode text then you could use .casefold() method (proper caseless comparison) and remove_punctuation() function:

#!/usr/bin/env python3
from collections import Counter
import regex # $ pip install regex

def remove_punctuation(text):
    return regex.sub(r"\p{P}+", "", text)

s = "HHHHello World!!!!!!!!!!"
no_punct = remove_punctuation(s)
characters = (c.casefold() for c in regex.findall(r'\X', no_punct))
print(Counter(characters).most_common(1)[0][0])

r'\X' regex is used to count user-perceived characters instead of mere Unicode codepoints.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670