I have a set of regular expressions for substitution in a file (sed.clean) as follow:
#!/bin/sed -f
s/https\?:\/\/[^ ]*//g
s/\.//g
s/\"//g
s/\,//g
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
and some more lines like those. I want to use this file for 'clean' a set of text files. To do this in bash I'd do something like this:
for file in $(ls rootDirectory)
do
sed -f sed.clean $file > OUTPUT_FILE
done
How could I do something similar in Python?
What I mean is if it is possible to leverage the n RE which I have in the sed.clean file (or rewrite them in the proper Python format) in order to avoid building a nested loop to compare each file with each RE, and just compare each file with a sed.clean python file as I do in bash. Something like this:
files = [ f for f in listdir(dirPath) if isfile(join(dirPath,f)) ]
for file in files:
newTextFile = re.sub(sed.clean, file)
saveTextFile(newTextFile, outputPath)
instead of this:
REs = ['s/https\?:\/\/[^ ]*//g', 's/\.//g',...,'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/']
files = [ f for f in listdir(dirPath) if isfile(join(dirPath,f)) ]
for file in files:
for re in REs:
newTextFile = re.sub(re, '', file)
saveTextFile(newTextFile, outputPath)
Thanks!