Want to parse through text and return only letters, digits, forward and back slashes and replace all else with ''
.
Is it possible to use just one regex pattern as opposed to several which then calls for looping? Am unable to get the pattern below not to replace the back and forward slash.
line1 = "1/R~e`p!l@@a#c$e%% ^A&l*l( S)-p_e+c=ial C{har}act[er]s ;E xce|pt Forw:ard\" $An>d B,?a..ck Sl'as<he#s\\2"
line2 = line
RGX_PATTERN = "[^\w]", "_"
for pattern in RGX_PATTERN:
line = re.sub(r"%s" %pattern, '', line)
print("replace1: " + line)
#Prints: 1ReplaceAllSpecialCharactersExceptForwardAndBackSlashes2
The code below from SO had been tested and found to be faster than regex but then it replaces all special characters including the / and \ that I want to preserve. Is there any way to edit it to work for my use case and still maintain its edge over regex?
line2 = ''.join(e for e in line2 if e.isalnum())
print("replace2: " + line2)
#Prints: 1ReplaceAllSpecialCharactersExceptForwardAndBackSlashes2
As an extra hurdle, the text am parsing should be in ASCII form so if possible characters from any other encoding should also be replaced by ''