I want to apply a regex on a Latin text, and I followed the solution in this question: How to account for accent characters for regex in Python?, where they suggest to add a # character before the regex.
def clean_str(string):
string = re.sub(r"#(@[a-zA-Z_0-9]+)", " ", string, re.UNICODE)
string = re.sub(r'#([^a-zA-Z0-9#])', r' \1 ', string, re.UNICODE)
string = re.sub(r'#([^a-zA-Z0-9#])', r' ', string, re.UNICODE)
string = re.sub(r'(\s{2,})', ' ', string, re.UNICODE)
return string.lower().strip()
My problem is, the regex work in detecting the latin characters, but nothing is applied from the regex set on the text.
example: if I have a text like "@aaa bbb các. ddd".
it should be like "bbb các . ddd" with space "before the DOT" and with deleting the Tag "@aaa".
But it produces the same input text!: "@aaa bbb các. ddd"
Did I miss something?