5

I have a folder /myfolder containing many latex tables.

I need to replace a character in each of them, namely replacing any minus sign -, by an en dash .

Just to be sure: we are replacing hypens INSIDE all of the tex file in that folder. I dont care about the tex file names.

Doing that manually would be a nightmare (too many files, too many minuses). Is there a way to loop over the files automatically and do the replacement? A solution in Python/R would be great.

Thanks!

ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
  • 1
    I'm not a big fan of bash, but for this problem you should really think about the `sed` command. One example there: https://www.cyberciti.biz/faq/unix-linux-replace-string-words-in-many-files/. If you don't like bash scripts, you can make an R loop that system-calls `sed` :') – F. Privé Jul 09 '17 at 13:25
  • thanks! @F.Privé do you mind posting a solution? I am either on windows/linux – ℕʘʘḆḽḘ Jul 09 '17 at 13:27
  • 1
    Following @user2722968's answer, `system("sed -i -e 's/-/–/g' /myfolder/*")` should work in R. Maybe using `*.tex` would be better. – F. Privé Jul 09 '17 at 13:30

5 Answers5

4

sed -i -e 's/-/–/g' /myfolder/* should work.

The expression does a search globally and replaces all - inside the files the shell expands from /myfolder/* with . Sed does the change in-place, that is, overwriting the original file (you need to explicitly specify a backup-file on MacOS, I can't remember the parameter though).

Absolutely no care is taken about wether or not the - is a verbatim hyphen or part of the latex syntax. Be aware of that.

user2722968
  • 13,636
  • 2
  • 46
  • 67
  • thanks, just to be sure: we are replacing hypens INSIDE the tex file, correct? not the name of the tex file – ℕʘʘḆḽḘ Jul 09 '17 at 13:29
  • 1
    Expanded the answer to reflect that. – user2722968 Jul 09 '17 at 13:36
  • thanks that is really helpful. do you think it is possible to use a regex to find the cases where the - corresponds to a proper minus (and is not tex code such as in `\cmidrule(lr){2-4}`? that is, can we specify a regex expression using `sed`? – ℕʘʘḆḽḘ Jul 10 '17 at 00:56
2

Try with sed

find /home/milenko/pr -type f -exec \
sed -i 's/-/–/g' {} +

from command line(if you are using Linux)

More about type

The find utility -exec clause is using {} to represent the matched files.

MishaVacic
  • 1,812
  • 8
  • 25
  • 29
2

To rename file names, use

rename 's/-/–/g' *

it will rename all the hyphens to en dash.

To replace all contents from hyphens to en dash, use

 sed -i 's/-/–/g' *tex
licitdev
  • 319
  • 2
  • 9
1

Python Solution

import os
directory = os.getcwd()
for filename in os.listdir(directory):
  if "-" in filename:
    os.rename(os.path.join(directory,filename),os.path.join(directory,filename.replace("-","-")))

New solution to replace characters inside a file

u2212 is unicode character for minus and u2014 for en-dash.

import os
directory = os.getcwd()
import fnmatch

def _changefiletext(fileName):
  with open(fileName,'r') as file:
    str = file.read()
    str = str.decode("utf-8").replace(u"\u2212",u"\u2014").encode("utf-8")
  with open(fileName,'wb') as file:
    file.write(str)

# Filter the files on which you want to run the replace code (*.txt in this case)    

matches = []
for root, dirnames, filenames in os.walk(directory):
    for filename in fnmatch.filter(filenames, '*.txt'):
        matches.append(os.path.join(root, filename))

for filename in matches:
  print "Converting file %s" %(filename)
  _changefiletext(filename)
sarbjit
  • 3,786
  • 9
  • 38
  • 60
  • thanks! is it possible to use `glob` instead? how can I speficy a specific folder here? – ℕʘʘḆḽḘ Jul 09 '17 at 13:49
  • also, I dont think the if - in filename is correct. I want to replace the hypen in all the tex files, irrelevant of their names – ℕʘʘḆḽḘ Jul 09 '17 at 13:51
  • 1
    import glob / x = glob.glob("mydirectory/*.tex") should work. – aschultz Jul 09 '17 at 13:51
  • thanks @aschultz. maybe you can post a soluion with that? – ℕʘʘḆḽḘ Jul 09 '17 at 13:55
  • also, this solution actually does not seem to work at all. here we are just renaming the files. I dont care about that unfortunately! How can you change that? thanks! – ℕʘʘḆḽḘ Jul 09 '17 at 13:57
  • 1
    I'm trying to write a solution but unfortunately I don't know how to convert from ascii to UTF8 in Python. Other than that, it works. So that is the last piece of the puzzle. – aschultz Jul 09 '17 at 14:00
  • 1
    I'm sorry, your initial question seemed like to replace file name only. Please find the updated solution. I tested this code by writing a *.txt file with minus (using unicode character) and then ran this code to replace it. – sarbjit Jul 09 '17 at 14:35
  • 1
    `glob` in Python2 will not work recursively, so instead `os.walk` and `fnmatch` is the better solution – sarbjit Jul 09 '17 at 15:00
1

First, back all your files up before removing the ".bak" in the code. I don't want to cause you to lose something, or if my script misfires, I'd like you to be able to recreate what you have.

Second, this is probably not very good Python code, because I am not an expert. But it works, if you are editing in utf-8. Because en dash is not an ASCII character, a straight replace doesn't work. I confess I'm not quite sure what's going on here, so bigger python experts may be able to sort out where I can do better.

#-*- coding: utf-8 -*-

import codecs
import glob
import re
import os

def replace_file(file):
    endash = "–".encode('utf-8')
    print ("Replacing " + file)
    temp = codecs.open("temp", "w", "utf-8")
    with codecs.open(file) as f:
        for line in f:
            line = re.sub("-", "–", line)
            temp.write(line)
    temp.close()
    f.close()
    os.system("copy temp \"" + file + ".bak\"")

x = glob.glob("*.tex")

for y in x:
    replace_file(y)
aschultz
  • 1,658
  • 3
  • 20
  • 30
  • 1
    thanks! but I am getting `UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)` – ℕʘʘḆḽḘ Jul 10 '17 at 00:48
  • Hm, sorry I dropped this...it works for me on my computer, and I wasn't able to find any ways it should work. I don't know enough about the guts of python to help you. I was getting the error you mentioned with earlier versions of my code, and I think the top comment needs to be in your code, too. Sorry I couldn't have done more for you, but your question helped me learn a few things, so thanks for that. – aschultz Jul 14 '17 at 02:12