I'm trying to replace all non-letters and non-whitespaces with ''. I thought the below code worked fine in a number of test cases, however, it failed when it comes to special, escaped characters.
import re
def process_text(text):
text = text.lower()
text = re.sub(pattern='[^A-z ^\s]',repl='',string=text).split(' ')
return [word for word in text if word != '']
process_text('abc 123')
>>>> ['abc'] # this is what I wanted.
process_text('abc 123 \n')
>>>> ['abc', '\n'] # I don't want the new line character.
The below link informed me that \s was any whitespace. https://www.debuggex.com/cheatsheet/regex/python
However, the official documentation says that \s is equivalent to "Matches any whitespace character; this is equivalent to [ \t\n\r\f\v]." https://docs.python.org/3/howto/regex.html
So I see now that my code says ~find anything that is not a letter and not in the above set of special characters and replace it with ''.
So is there a way to retain whitespace but remove the other special characters?