I have a simple problem that is driving me crazy, and seems to be due to the handling in python of unicode
characters.
I have latex
table stored on my disk (very similar to http://www.jwe.cc/downloads/table.tex), and I want to apply some regex on it so that hyphens -
(\u2212
) are replaced by en-dashes –
(alt 0150
or \u2013
)
I am using the following function that performs two different regex-and-replace.
import re
import glob
def mychanger(fileName):
with open(fileName,'r') as file:
str = file.read()
str = str.decode("utf-8")
str = re.sub(r"((?:^|[^{])\d+)\u2212(\d+[^}])","\\1\u2013\\2", str).encode("utf-8")
str = re.sub(r"(^|[^0-9])\u2212(\d+)","\\1\u2013\\2", str).encode("utf-8")
with open(fileName,'wb') as file:
file.write(str)
myfile = glob.glob("C://*.tex")
for file in myfile: mychanger(file)
Unfortunately, this does not change anything.
It works though, if I use a non unicode character like $
instead of \u2013
, which means the regex code is correct.
I am lost here, I tried using re.sub(ur"((?:^|[^{])\d+)\u2212(\d+[^}])","\\1\u2013\\2", str).encode("utf-8")
but it still does not change anything.
What is wrong here? Thanks!