0

How do I remove punctuation, digits and spaces from a file in python3.

fname = input("Enter the name of the file: ")
fh = open(fname)
for line in fh:
    line = line.strip()

2 Answers2

0

this print all character except ponctuation, digits and spaces:

from string import whitespace, punctuation, digits

fname = input("Enter the name of the file: ")
with open(fname) as f:
    for line in f:
        print(''.join(filter(lambda c: c not in whitespace + digits + punctuation, line)),
              end="")

With a comprehension:

from string import whitespace, punctuation, digits

fname = input("Enter the name of the file: ")
with open(fname) as f:
    for line in f:
        print(
            ''.join(c for c in line if c not in whitespace + punctuation + digits),
            end="")

If you want to replace the file with the new content, here is the code:

from pathlib import Path
from string import whitespace, punctuation, digits

file_name = Path(input("Enter the name of the file: "))
file_name.write_text(''.join(c for c in file_name.read_text() if
                             c not in whitespace + punctuation + digits))

You should look at Path, it's very useful! Also, the string module contains a lot of shortcut for this kind of manipulation.

Here a more details code about what is done. Each step is decomposed and clarified:

import pathlib
import string

# This concatenates the three strings together. Each string is from the string
# module. The result is "!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~0123456789"
CHARACTER_TO_DELETE = string.whitespace + string.punctuation + string.digits


def character_not_to_delete(character):
    """
    This function checks if the character is not in the character_to_delete.
    """
    return character not in CHARACTER_TO_DELETE


def clean_file():
    """
    This function recreate the content of the file without any whitespace 
    character, punctuation or digits.
    """
    file_name = pathlib.Path(input("Enter the name of the file: "))

    # This open the file, return the content and close the file, thanks to Path
    file_content = file_name.read_text()

    # Use a comprehension list to create a list of characters that are not in
    # the character_to_delete
    create_list_of_allowed_character = [c for c in file_content
                                        if character_not_to_delete(c)]

    # Concatenate the list of characters to create a string
    new_content = ''.join(create_list_of_allowed_character)

    # This open the file, write the new content and close the file
    file_name.write_text(new_content)

if __name__ == "__main__":
    clean_file()
Dorian Turba
  • 3,260
  • 3
  • 23
  • 67
  • Like can you tell me what the code is doing statement by statement. I do understand that what it's doing is that it removes all the special characters and digits(I have tried it) but what's important that I want to understand the code. So can you help me or can you just provide a link. It'll be really helpful... – chaghtai Trkan Dec 24 '21 at 13:00
  • I won't details how path works, but I will explain the logic if it's not that clear :) – Dorian Turba Dec 24 '21 at 15:15
  • @chaghtaiTrkan what do you think of the edit ? – Dorian Turba Dec 24 '21 at 15:29
0

We will read file, use regex to remove unecessary characters and write it back:

with open(fname, "r") as f:
    content = f.read()
    c = re.sub("[0-9,.:;?!\"' ]", "", content)

with open(fname, "w") as f:
    f.write(c)
kosciej16
  • 6,294
  • 1
  • 18
  • 29