I am using a small function to loop over files so that any hyphens -
get replaced by en-dashes –
(alt + 0150).
The function I use adds some regex flavor to a solution in a related problem (how to replace a character INSIDE the text content of many files automatically?)
def mychanger(fileName):
with open(fileName,'r') as file:
str = file.read()
str = str.decode("utf-8")
str = re.sub(r"[^{]{1,4}(-)","–", str).encode("utf-8")
with open(fileName,'wb') as file:
file.write(str)
I used the regular expression [^{]{1,4}(-)
because the search is actually performed on latex regression tables and I only want to replace the hyphens that occur around numbers.
To be clear: I want to replace all hyphens EXCEPT in cases where we have genuine latex code such as \cmidrule(lr){2-4}
.
In this case there is a
{
close (within 3-4 characters max) to thehyphen
and to the left of it. Of course, this hyphen should not be changed into an en-dash otherwise the latex code will break.I think the left part condition of the exclusion is important to write the correct exception in regex. Indeed, in a regression table you can have things like
-0.062\sym{***}
(that is, a{
on the close right of the hyphen) and in that case I do want to replace the hyphen.
A typical line in my table is
variable & -2.061\sym{***}& 4.032\sym{**} & 1.236 \\
& (-2.32) & (-2.02) & (-0.14)
However, my regex does not appear to be correct. For instance, a (-1.2)
will be replaced as –1.2
, dropping the parenthesis.
What is the problem here? Thanks!