I'm trying to preprocess a text file that is in Persian, but the problem is that for digits, sometimes they used Arabic digits instead of Persian ones. I want to fix this using regex. Here is my snippet of code:
def preprocessing(content):
import re
for d in range(10):
arabic_digit = rf"\u066{d}"
persian_digit = rf"\u06F{d}"
content = re.sub(arabic_digit, persian_digit, content)
return(content)
but it gives this error message:
error: bad escape \u at position 0
I wonder how should I use variables inside the regex patterns. The weird thing is that the problem is with the second pattern (persian_digit
) and when I change it to a static string, there are no errors. Thanks for your time.