1

I'm practicing my python and I want to remove all characters that are not words from a big document and just have all the words after. I want to keep it in the same way and not use anything fancy. This is what I currently have:

file = open('declarationOfRights.txt', 'r')
declines = file.readlines()
index=0
while index < len(declines):
    declines[index] = declines[index].strip('\n').strip(',').strip('.').strip(';').strip('\'').lower()
    index+=1
print(declines)
DYZ
  • 55,249
  • 10
  • 64
  • 93
  • Does this answer your question? [Stripping everything but alphanumeric chars from a string in Python](https://stackoverflow.com/questions/1276764/stripping-everything-but-alphanumeric-chars-from-a-string-in-python) – F1Rumors Nov 16 '20 at 21:38
  • So what is your question? Does this code work? Do you think this is using list comprehension, or are you asking how to do this via list comprehension instead? – Random Davis Nov 16 '20 at 21:38
  • @RandomDavis I'm asking how to do it via list comprehension because it's not working – youngcoder12122 Nov 16 '20 at 21:54
  • @F1Rumors Not really – youngcoder12122 Nov 16 '20 at 21:54
  • If it is not working now, turning it into a list comprehension will make a list comprehension which is not working. I would suggest that you try to understand why it currently does not work instead, and try to make it work exactly the way you started, so that you can learn from your mistakes. Then, if you still don't like it when it's working, try something else. ;) – zvone Nov 16 '20 at 22:28

3 Answers3

0

You can use the re module:

import re

declines = "hello, my friend! how; are; you? good# $ok"
[word for word in re.split('[^a-zA-Z]', declines) if word]

--> ['hello', 'my', 'friend', 'how', 'are', 'you', 'good', 'ok']
Deneb
  • 981
  • 2
  • 9
  • 25
0

.strip() removes the characters at the end of a string, but not in the middle. Consider using regular expressions (I assume you still want to keep spaces):

import re
with open('declarationOfRights.txt') as file:
    lines = re.sub("[^a-zA-Z0-9\s]", "", file.read())
print(lines)
DYZ
  • 55,249
  • 10
  • 64
  • 93
0

This can be turned into a list comprehension as follows:

lines = [line.replace('\n','').replace(',','').replace('.','').replace(';','').replace("'",'').lower() for line in open('declarationOfRights.txt', 'r')]
print(lines)

I used replace() instead of strip(). This is because strip only removes characters from the ends. If you want to replace every instance of these characters you might want to use replace(). If you really want to strip() list comprehension will look like:

lines = [line.strip('\n').strip(',').strip('.').strip(';').strip('\'').lower() for line in open('declarationOfRights.txt', 'r')]
Hamza
  • 5,373
  • 3
  • 28
  • 43