5

I am currently trying to input a text file, separate each word and organize them into a list.

The current problem I'm having is getting rid of commas and periods from the text file.

My code is below:

#Process a '*.txt' file.
def Process():
    name = input("What is the name of the file you would like to read from? ")

    file = open( name , "r" )
    text = [word for line in file for word in line.lower().split()]
    word = word.replace(",", "")
    word = word.replace(".", "")

    print(text)

The output I'm currently getting is this:

['this', 'is', 'the', 'first', 'line', 'of', 'the', 'file.', 'this', 'is', 'the', 'second', 'line.']

As you can see, the words "file" and "line" have a period at the end of them.

The text file I'm reading is:

This is the first line of the file.

This is the second line.

Thanks in advance.

Keyfer Mathewson
  • 1,035
  • 2
  • 16
  • 27

3 Answers3

8

These lines have no effect

word = word.replace(",", "")
word = word.replace(".", "")

just change your list comp to this:

[word.replace(",", "").replace(".", "") 
 for line in file for word in line.lower().split()]
jamylak
  • 128,818
  • 30
  • 231
  • 230
  • Did they use to have an effect? For some reason my teacher told us to look at them? – Keyfer Mathewson Mar 20 '13 at 22:54
  • They don't have an effect in this case because after the list comprehension is done, `word` will always be a reference to the last word in your list of words. @jamylak's version of the code correctly does the replacements on each word in the list as it's being handled. – bgporter Mar 20 '13 at 23:05
6

Maybe strip is more appropriate than replace

def Process():
    name = input("What is the name of the file you would like to read from? ")

    file = open(name , "r")
    text = [word.strip(",.") for line in file for word in line.lower().split()]
    print(text)
>>> help(str.strip)
Help on method_descriptor:

strip(...)
    S.strip([chars]) -> string or unicode

    Return a copy of the string S with leading and trailing
    whitespace removed.
    If chars is given and not None, remove characters in chars instead.
    If chars is unicode, S will be converted to unicode before stripping
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
0

Try this:

 chars = [',', '.']

 word.translate(None, ''.join(chars))

For Python3

 chars = [',', '.']
 word.translate({ord(k): None for k in chars})
Tareq
  • 689
  • 1
  • 10
  • 22