I'm trying to do a script to automate a simple task of removing characters from txt files and I want to save it with the same name but without the chars. I have multiple txt files: e.g 1.txt, 2.txt ... 200.txt, stored in a directory (Documents). I have a txt file with the characters I want to remove. At the beginning I though to compare my chars_to_remove.txt to all my different files (1.txt, 2.txt...) but I could find a way to do so. Instead, I created a string with all those chars I want to remove.
Let's say I have the following string in 1.txt file:
Mean concentrations α, maximum value ratio β and reductions in NO2 due to the lockdown Δ, March 2020, 2019 and 2018 in Madrid and Barcelona (Spain).
I want to remove α
, β
, and Δ
chars from the string. This is my code as far.
import glob
import os
chars_to_remove = '‘’“”|n.d.…•∈αβδΔεθϑφΣμτσχ€$∞http:www.←→≥≤<>▷×°±*⁃'
file_location = os.path.join('Desktop', 'Documents', '*.txt')
file_names = glob.glob(file_location)
print(file_names)
for f in file_names:
outfile = open(f,'r',encoding='latin-1')
data = outfile.read()
if chars_to_remove in data:
data.replace(chars_to_remove, '')
outfile.close()
The variable data
stores in each iteration all the content from the txt files. I want to check if there are chars_to_remove
in the string and remove it with replace()
function. I tried different approaches suggested here and here without sucess.
Also, I tried to compare it as a list:
chars_to_remove = ['‘','’','“','”','|','n.d.','…','•','∈','α','β','δ','Δ','ε','θ','ϑ','φ','Σ','μ','τ','σ','χ','€','$','∞','http:','www.','←','→','≥','≤','<','>','▷','×','°','±','*','⁃']
but got datatype errors when comparing.
Any further idea will be appreciated!