1

I have certain text within a command \grk{} that looks like this:

\grk{s`u e@i `o qrist`os <o u<i`ws to~u jeo`u `ao~u z~wntos} 

I need to find all instances where there is a white space followed by ` and replace it with white space followed by the word XLFY

The result from the above should be:

\grk{s`u e@i XLFYo qrist`os <o u<i`ws to~u jeo`u XLFYao~u z~wntos} 

and all other instances of white space followed by ` outside \grk{} should be ignored.

I got this far:

(?<=grk\{)(.*?)(?=\})

This finds and selects all the text within \grk{}

Any idea how I can just select the white space followed by the ` that is inside and replace it?

stx932
  • 145
  • 9
  • 1
    Which regex engine? Must you avoid performing the same replacement in other contexts (i.e. outside a `\grk{}` block)? – John Bollinger Feb 04 '16 at 15:54
  • Is there a possibility of other `{}` blocks nested inside the `\grk{}` block? – John Bollinger Feb 04 '16 at 15:57
  • 2
    Is there any text outside of `\grk{}` ? If there is no other text and you need to do this only inside `\grk{}` you can simply use **/(?<= )'/g** and replace with `XLFY` **[like in this demo](https://regex101.com/r/uF3yJ9/1)**. Is this what you are looking for ? –  Feb 04 '16 at 16:06
  • Conversely, if you need to restrict the replacement to occur only within the contents of `\grk{}` blocks within a larger document then regex probably is not enough by itself. – John Bollinger Feb 04 '16 at 18:02
  • I use Textmate. How do I find out which engine? There is no possibility of any other {} blocks nested inside \grk{} block. I just need a way to select all occurences of space followed by an accent. Can't believe Regex is not advanced enough to do that! – stx932 Feb 04 '16 at 21:47
  • You might want to have a look at PowerGREP – Denham Coote Feb 05 '16 at 11:50

2 Answers2

1

You could pretty easily do it with the help of a programming language (some PHP code to show the concept, could be achieved with other languages as well), here's a code which takes the file content into account as well:

<?php
foreach(glob(".*txt") as $filename) {
    // load the file content 
    $content = file_get_contents($filename);
    $regex = '#\\\grk{[^}]+}#';

    $newContent = preg_replace_callback(
        $regex, 
        function($matches) {
            $regex = '#\h{1}`#';
            return preg_replace($regex, ' XLFY', $matches[0]);
        },
        $content);

    // write it back to the original file
    file_put_contents($filename, $newContent);
}
?>

The idea is to grab the text between grk and the curly braces in the first step, then to replace every occurence of a whitespace followed by "`".

Jan
  • 42,290
  • 8
  • 54
  • 79
  • I don't have a problem replacing all those instances because Textmate provides that functionality. I just need a Regex formula that finds them. I have like 300 files with this all within one directory. Would I have to run this on each of them? I will try first thing tomorrow what you suggested (writing this from a tablet) – stx932 Feb 04 '16 at 21:51
  • With the solution you offered I would have to input all instances of \grk into one file which would replace them and than put them back in text. Is there any chance to apply this on all the files that are inside one directory or on \grk instances of the file that is opened? I am sorry but I don't know much about PhP (I am working in Latex) so I cant change what you made alone. – stx932 Feb 05 '16 at 10:03
  • @eklisiarh: See my updated answer - the code takes all `*.txt` files from the current directory (incl. all subdirectories), analyzes the content and writes it back to the **original** file (so please make a backup before ;-)). – Jan Feb 05 '16 at 11:38
1

If you have file with many \grk{} sections (and others), probably the fastest way to achieve the goal is what @Jan suggested. @noob regex is fine for single \grk{}.

The problem with (?<=grk\{)(.*?)(?=\}) is that you can't get fixed length lookbehind in most regex engines, so you can't ommit any text before " `". Take a look at this post.

You can also use bash script:

#!/bin/bash
file=$1
newFile=$file"_replaced"
val=`cat $file`
regex="\\\grk\{(.*?)\}"

cp $file $newFile

grep -oP $regex $file | while read -r line; do
    replacement=`echo $line | sed -r 's/(\s)\`/\1XLFY/g'`
    sed -i "s/$line/$replacement/g" $newFile
done

cat $newFile

which takes file as an argument and create file_replaced meeting your conditions.

EDIT: Run script for each file in directory:

for file in *; do ./replace.sh $file; done;

before that change the script, to it override existing file:

#!/bin/bash
file=$1
val=`cat $file`
regex="\\\grk\{(.*?)\}"

grep -oP $regex $file | while read -r line; do
    replacement=`echo $line | sed -r 's/(\s)\`/\1XLFY/g'`
    sed -i "s/$line/$replacement/g" $file
done

But if you don't use any VCS, please make a backup of your files!

EDIT2: debug

#!/bin/bash
file=$1
val=`cat $file`
echo '--- file ---'
echo $val
regex="\\\grk\{(.*?)\}"
echo 'regex: '$regex
grep -oP $regex $file | while read -r line; do
    echo 'LINE:        '$line
    replacement=`echo $line | sed -r 's/(\s)\`/\1XLFY/g'`
    echo 'REPLACEMENT: '$replacement
    sed -i "s/$line/$replacement/g" $file
done
echo '--- file after ---'
cat $file
Community
  • 1
  • 1
kolejnik
  • 136
  • 5
  • Any chance you could modify this solution so that it looks in the file which is just opened and replaces all instances? I am total noob when it comes to bash and php sorry :( – stx932 Feb 05 '16 at 10:04
  • What operating system do yu use? In unix-like os you can save this script as replace.sh, add execute permission (chmod +x replace.sh) and the call `./replace.sh any_file`. – kolejnik Feb 05 '16 at 10:09
  • I tried doing this via Terminal in Mac and it created a replaced file but in that file the whitespace ` was not replaced by whitespace XLFY – stx932 Feb 05 '16 at 11:03
  • The fact that it created a replaced file means that I executed the file correct right? Any idea why it didn't replace it? – stx932 Feb 05 '16 at 11:18
  • Could you run "debug" script I added and paste the output? – kolejnik Feb 05 '16 at 11:23
  • Is this what you need: regex: \\grk\{(.*?)\} usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]] [-e pattern] [-f file] [--binary-files=value] [--color=when] [--context[=num]] [--directories=action] [--label] [--line-buffered] [--null] [pattern] [file ...] – stx932 Feb 05 '16 at 11:27
  • I also created a sample of my text where you can try it: http://pastebin.com/nz8wXeN5 – stx932 Feb 05 '16 at 11:28
  • We have different versions of grep, you can't use `-P` flag. Please try replace line `grep -oP $regex $file | while read -r line; do` with `perl -nle'print $& if m{'$regex'}' $file | while read -r line; do`. But you need to have perl installed. – kolejnik Feb 05 '16 at 11:40
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/102699/discussion-between-eklisiarh-and-kolejnik). – stx932 Feb 05 '16 at 11:51