How to remove all commas, new lines, quotes, periods from a file using list comprehensions?

Question

I'm practicing my python and I want to remove all characters that are not words from a big document and just have all the words after. I want to keep it in the same way and not use anything fancy. This is what I currently have:

file = open('declarationOfRights.txt', 'r')
declines = file.readlines()
index=0
while index < len(declines):
    declines[index] = declines[index].strip('\n').strip(',').strip('.').strip(';').strip('\'').lower()
    index+=1
print(declines)

Does this answer your question? [Stripping everything but alphanumeric chars from a string in Python](https://stackoverflow.com/questions/1276764/stripping-everything-but-alphanumeric-chars-from-a-string-in-python) — F1Rumors, Nov 16 '20 at 21:38
So what is your question? Does this code work? Do you think this is using list comprehension, or are you asking how to do this via list comprehension instead? — Random Davis, Nov 16 '20 at 21:38
@RandomDavis I'm asking how to do it via list comprehension because it's not working — youngcoder12122, Nov 16 '20 at 21:54
If it is not working now, turning it into a list comprehension will make a list comprehension which is not working. I would suggest that you try to understand why it currently does not work instead, and try to make it work exactly the way you started, so that you can learn from your mistakes. Then, if you still don't like it when it's working, try something else. ;) — zvone, Nov 16 '20 at 22:28

score 0 · Answer 1 · answered Nov 16 '20 at 21:40

0

You can use the re module:

import re

declines = "hello, my friend! how; are; you? good# $ok"
[word for word in re.split('[^a-zA-Z]', declines) if word]

--> ['hello', 'my', 'friend', 'how', 'are', 'you', 'good', 'ok']

answered Nov 16 '20 at 21:40

Deneb

981
2
9
25

score 0 · Answer 2 · answered Nov 16 '20 at 21:44

.strip() removes the characters at the end of a string, but not in the middle. Consider using regular expressions (I assume you still want to keep spaces):

import re
with open('declarationOfRights.txt') as file:
    lines = re.sub("[^a-zA-Z0-9\s]", "", file.read())
print(lines)

Hamza · Answer 3 · 2020-11-16T22:43:10.417

This can be turned into a list comprehension as follows:

lines = [line.replace('\n','').replace(',','').replace('.','').replace(';','').replace("'",'').lower() for line in open('declarationOfRights.txt', 'r')]
print(lines)

I used replace() instead of strip(). This is because strip only removes characters from the ends. If you want to replace every instance of these characters you might want to use replace(). If you really want to strip() list comprehension will look like:

lines = [line.strip('\n').strip(',').strip('.').strip(';').strip('\'').lower() for line in open('declarationOfRights.txt', 'r')]

How to remove all commas, new lines, quotes, periods from a file using list comprehensions?

3 Answers3