0

I'm writing a program in Python for an online class in order to find the frequency of letters in a file. Thing is I keep getting spaces included in the final result too. How can I omit them? Here's my code:

import string
name = raw_input('Enter a file name: ')
fhandle = open(name)
counts = dict()
for line in fhandle:
    line = line.strip()
    line = line.translate(None,string.punctuation)
    line = line.lower()
    letters = list(line)
    for letter in letters:
        counts[letter]=counts.get(letter,0)+1
 lst = list()
    for letter,count in counts.items():
        lst.append((count,letter))
lst.sort(reverse=True)
for count,letter in lst:
    print count,letter
mthe25
  • 61
  • 1
  • 2
  • 7
  • 1
    You have good summary of different methods to remove whitespaces, EOL (end of line), tabs etc, here: http://stackoverflow.com/questions/8270092/python-remove-all-whitespace-in-a-string – Tom Jul 02 '16 at 10:09

3 Answers3

4

string.punctuation contains !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ and no whitespace characters.

You should change your call to translate() to the following:

line.translate(None,string.punctuation+string.whitespace+string.digits)

Type help(string) in the python interpreter for more information.

akhilmd
  • 182
  • 1
  • 8
0

If you don't want to print the letter if it is a blank space (and you don't want to change anything else in your code), then you can add one if statement in the last for loop:

for count,letter in lst:
    if letter != ' ':
        print count,letter
Mukherjee
  • 486
  • 1
  • 3
  • 11
0

An elegant way to do this is to just use isalpha(). See line 11:

import string
name = raw_input('Enter a file name: ')
fhandle = open(name)
counts = dict()
for line in fhandle:
    line = line.strip()
    line = line.translate(None,string.punctuation)
    line = line.lower()
    letters = list(line)
    for letter in letters:
        if letter.isalpha() == True:
            counts[letter]=counts.get(letter,0)+1
    lst = list()
    for letter,count in counts.items():
        lst.append((count,letter))
lst.sort(reverse=True)
for count,letter in lst:
    print count,letter
Jaxian
  • 1,146
  • 7
  • 14
  • What about unicode? See [Python isalpha() and scandics](http://stackoverflow.com/questions/4286637/python-isalpha-and-scandics) – Peter Wood Jul 02 '16 at 10:23
  • OP said he was looking for "the frequency of letters in a file" and this does precisely that with just one additional line of code. – Jaxian Jul 02 '16 at 10:27
  • If the file is Unicode `isalpha` can fail to recognise characters. Also `if letter.isalpha() == True:` can be `if letter.isalpha():`, although still won't work as the line needs decoding first. See the linked question. – Peter Wood Jul 02 '16 at 10:33
  • I've tested that code I submitted multiple times with multiple text files and it works fine. – Jaxian Jul 02 '16 at 10:36
  • [*'Testing shows the presence, not the absence of bugs'*](https://en.wikiquote.org/wiki/Edsger_W._Dijkstra#1960s). It would fail with a file containing `äöå`. – Peter Wood Jul 02 '16 at 10:37
  • I understand that, but the OP did not specify looking for characters outside the latin alphabet, so I put together the best solution for that situation. – Jaxian Jul 02 '16 at 10:41
  • 1
    If they didn't specify it, why assume? You can ask questions in comments on the original question, or describe the limitations and assumptions you've made in your answer. – Peter Wood Jul 02 '16 at 10:42