How do I remove punctuation, digits and spaces from a file in python3.
fname = input("Enter the name of the file: ")
fh = open(fname)
for line in fh:
line = line.strip()
How do I remove punctuation, digits and spaces from a file in python3.
fname = input("Enter the name of the file: ")
fh = open(fname)
for line in fh:
line = line.strip()
this print all character except ponctuation, digits and spaces:
from string import whitespace, punctuation, digits
fname = input("Enter the name of the file: ")
with open(fname) as f:
for line in f:
print(''.join(filter(lambda c: c not in whitespace + digits + punctuation, line)),
end="")
With a comprehension:
from string import whitespace, punctuation, digits
fname = input("Enter the name of the file: ")
with open(fname) as f:
for line in f:
print(
''.join(c for c in line if c not in whitespace + punctuation + digits),
end="")
If you want to replace the file with the new content, here is the code:
from pathlib import Path
from string import whitespace, punctuation, digits
file_name = Path(input("Enter the name of the file: "))
file_name.write_text(''.join(c for c in file_name.read_text() if
c not in whitespace + punctuation + digits))
You should look at Path, it's very useful! Also, the string module contains a lot of shortcut for this kind of manipulation.
Here a more details code about what is done. Each step is decomposed and clarified:
import pathlib
import string
# This concatenates the three strings together. Each string is from the string
# module. The result is "!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~0123456789"
CHARACTER_TO_DELETE = string.whitespace + string.punctuation + string.digits
def character_not_to_delete(character):
"""
This function checks if the character is not in the character_to_delete.
"""
return character not in CHARACTER_TO_DELETE
def clean_file():
"""
This function recreate the content of the file without any whitespace
character, punctuation or digits.
"""
file_name = pathlib.Path(input("Enter the name of the file: "))
# This open the file, return the content and close the file, thanks to Path
file_content = file_name.read_text()
# Use a comprehension list to create a list of characters that are not in
# the character_to_delete
create_list_of_allowed_character = [c for c in file_content
if character_not_to_delete(c)]
# Concatenate the list of characters to create a string
new_content = ''.join(create_list_of_allowed_character)
# This open the file, write the new content and close the file
file_name.write_text(new_content)
if __name__ == "__main__":
clean_file()
We will read file, use regex to remove unecessary characters and write it back:
with open(fname, "r") as f:
content = f.read()
c = re.sub("[0-9,.:;?!\"' ]", "", content)
with open(fname, "w") as f:
f.write(c)