1

I'm trying to figure out how to open a file, make all the letters in the file lowercase, and then take out all the punctuation. I've tried a few things I've seen online and in my book but I can't seem to figure it out.

import string

def ReadFile(Filename):
    try:
        F = open(Filename)
        F2=F.read()
    except IOError:
        print("Can't open file:",Filename)
        return []
    F3=[]
    for word in F2:
        F3=F2.lower()
    exclude = set(string.punctuation)
    F3= ''.join(ch for ch in F3 if ch not in exclude)
    return F3







Name = input ('Name of file? ')
Words = ReadFile(Name)
print (F3)

Given a sentence such as,

Then he said, "I'm so confused!".

To become

then he said im so confused
DSM
  • 342,061
  • 65
  • 592
  • 494
Bob
  • 1,344
  • 3
  • 29
  • 63
  • Since `F2` is a string, `for word in F2` is actually iterating over the _characters_, not the _words_. As it turns out, this won't affect your code (except to make it slower), because lowercasing each letter in a word obviously lowercases the word, but it still makes the code misleading and harder to understand. – abarnert Apr 24 '13 at 18:57
  • 2
    http://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string-in-python – CppLearner Apr 24 '13 at 18:57
  • I really like using translate for this :P that is by far the fastest method :) and its cool – Joran Beasley Apr 24 '13 at 19:01

2 Answers2

2

The problem with your code is in the very last line:

print (F3)

F3 was the name of the local variable inside the function. You can't access that from here.

But you can access the same value that was in that variable, because the function returned it, and you stored it in Words.

So, just do this:

print(Words)

And now, your code works.


That being said, it can be improved.

Most importantly, look at this part:

F3=[]
for word in F2:
    F3=F2.lower()

The for word in F2: actually loops over every character in F2, because that's how strings work. If you want to go word by word, you need to do something like for word in F2.split():

Meanwhile, inside the loop, you reassign F3 each time through the loop, and never do anything with the previous value, so the whole thing ends up being a very fancy (and slow) way to just do the last assignment.

Fortunately, the last assignment, F3=F2.lower() lowercases the entire string F2, which is exactly what you wanted to do, so it works out anyway. Which means you can replace all three of those lines with:

F3=F2.lower()

You also should always close files that you open. Since this can be tricky (e.g., in your function, you have to remember to close it in both the successful and error cases), the best way to do that is automatically, using a with clause. Replace these two lines:

F = open(Filename)
F2=F.read()

with:

with open(Filename) as F:
    F2=F.read()

After that, other than using a non-PEP-8 style, and performance problems if you have huge files, there's really nothing wrong with your code.

abarnert
  • 354,177
  • 51
  • 601
  • 671
0

There are many discussion on this topic, a simple and effective way is:

s="Then he said, \"I\'m so confused!\"." 
s.translate(string.maketrans("",""), string.punctuation)

Similar discussions can be found here:

Remove punctuation from Unicode formatted strings

Python Regex punctuation recognition

Best way to strip punctuation from a string in Python

Community
  • 1
  • 1
Shijing Lv
  • 6,286
  • 1
  • 20
  • 12