First, using str.replace
in a loop is inefficient. Since strings are immutable, you would be creating a need string on each of your iterations. You can use str.translate
to remove the unwanted characters in a single pass.
As to removing a dash only if it is not a boundary character, this is exactly what str.strip
does.
It also seems the characters you want to remove correspond to string.punctuation
, with a special case for '-'
.
from string import punctuation
def remove_special_character(s):
transltation = str.maketrans('', '', punctuation.replace('-', ''))
return ' '.join([w.strip('-') for w in s.split()]).translate(transltation)
polluted_string = '-This $string contain%s ill-desired characters!'
clean_string = remove_special_character(polluted_string)
print(clean_string)
# prints: 'This string contains ill-desired characters'
If you want to apply this to multiple lines, you can do it with a list-comprehension.
lines = [remove_special_character(line) for line in lines]
Finally, to read a file you should be using a with
statement.
with open(file, "r") as f
lines = [remove_special_character(line) for line in f]