4

I have this input.txt file:

Dog walks in the park
Man runs in the park
Man walks in the park
Dog runs in the park
Dog stays still
They run in the park
Woman runs in the park

I want to search for matches of the runs? regular expression and output them to a file, while highlighting matches with two asterisks on both sides of the match. So my desired output is this:

Man **runs** in the park
Dog **runs** in the park
They **run** in the park
Woman **runs** in the park

I want write a function that would be a wrapper for this Perl one-liner (and it would do few other things) and then invoke it with a regular expression as its parameter. I wrote the following script:

#!/bin/bash

function reg {
    perl -ne 's/($1)/**\1**/&&print' input.txt > regfunctionoutput.txt
}

function rega {
    regex="$1"
    perl -ne 's/($regex)/**\1**/&&print' input.txt > regafunctionoutput.txt
}

perl -ne 's/(runs?)/**\1**/&&print' input.txt > regularoutput.txt
reg 'runs?'
rega 'runs?'

The output of the first Perl one-liner is what I want. But when I try to wrap it in a reg function and pass the expression as a parameter, instead of desired output I get:

****Dog walks in the park
****Man runs in the park
****Man walks in the park
****Dog runs in the park
****Dog stays still
****They run in the park
****Woman runs in the park

I thought the issue was some conflict between $1 as a function parameter and the first capturing group in the Perl one-liner. So I created a second function, rega, which first assigns that expression to a different variable and only then passes it to Perl. But the output is the same as previous function.

So, how can I pass a regular expression to a Perl one-liner inside the function? What I am doing wrong?

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Rafal
  • 864
  • 10
  • 21
  • 2
    What happens when you put double quotes inside your function ? (i.e. write `perl -ne "s/($1)/**\1**/&&print"`) – Aserre Jul 22 '15 at 08:27
  • You could do the same thing more efficiently with `sed`. See simbabque's answer for how to quote it. – Peter Cordes Jul 22 '15 at 08:57
  • @Ploutox Using double quotes solves the issue. In my earlier testing I assumed I need to use double quotes for variable expansion, but it caused some unexpected results. Now all is fine. I will need to do more testing to find out what what the issue earlier. – Rafal Jul 22 '15 at 15:37
  • 1
    @PeterCordes I can not use sed, because I am using perl regular expressions and some of them won't work directly with sed. Since I am also doing some manual work with them in text editor, porting expressions to sed is additional step which can be skipped. But thanks for suggestion. – Rafal Jul 22 '15 at 15:38

2 Answers2

4

You need to use double-quotes " because the shell does not interpolate variables in single-quotes '. This is also nicely explained in this answer.

function reg {
    perl -ne "s/($1)/**\$1**/g&&print" input.pl > regfunctionoutput.txt
}

Furthermore, in Perl the regex capture groups end up in $1, $2 and so on. Not in \1. If you turn on warnings (with -w in your one-liner) you will get a \1 better written as $1 warning. It is explained in perldiag.

\%d better written as $%d

(W syntax) Outside of patterns, backreferences live on as variables. The use of backslashes is grandfathered on the right-hand side of a substitution, but stylistically it's better to use the variable form because other Perl programmers will expect it, and it works better if there are more than 9 backreferences.

The (W syntax) means that you can turn this warning off with no warnings 'syntax';

Community
  • 1
  • 1
simbabque
  • 53,749
  • 8
  • 73
  • 136
  • When I tested it with this, it worked. You definitely need the double quotes though. – simbabque Jul 22 '15 at 15:29
  • 1
    Before writing my comment, I tested it, but I checked different file for output, which was from another function and led me to incorrect conclusion that your solution doesn't work. I discovered that after writing my first comment, so I deleted it. **Your solution works**, which is great. But as far as I can see **it is because of double quotes** (that expand variable), not because of using ``$1`` instead of ``\1`` for replace. – Rafal Jul 22 '15 at 15:46
  • Probably. Though @Сухой27 is also right about the `$1` vs `\1` which gives a warning. – simbabque Jul 22 '15 at 15:50
  • I added that to the answer. – simbabque Jul 22 '15 at 15:51
  • 1
    I think that information about double quotes should be added at the top of the answer, as this is really source of the issue. Information about ``\1`` vs ``$1`` with explanation why that way is really better could be added as side note. As is, it suggests that ``\1`` is a problem here, which it isn't. So I am reluctant to accept answer that solves the problem, but suggest that its source lays elsewhere than in reality. – Rafal Jul 22 '15 at 16:03
  • I accepted the answer, even though I still don't know why ``$1`` is better. But that is probably thing for another question, not comments. – Rafal Jul 22 '15 at 16:06
  • @Rafal I added an explanation for that – simbabque Jul 22 '15 at 16:13
  • Thanks a lot, that explains it. – Rafal Jul 22 '15 at 16:14
2

You can pass the $1 regex as a command line parameter, and compile it with qr// as single quotes for a Perl script don't interpolate under the shell,

perl -ne '
  BEGIN{ ($re) = map qr/$_/, shift @ARGV }
  s/($re)/**\1**/ && print
' "$1" input.txt > regfunctionoutput.txt

Using the %ENV environment variable:

perl -ne '
  BEGIN{ ($re) = map qr/$_/, $ENV{1} }
  s/($re)/**\1**/ && print
' input.txt > regfunctionoutput.txt

And as a side note, if you enable warnings with -w it will tell you that \1 is better written as $1 for the substitution part of s///.

mpapec
  • 50,217
  • 8
  • 67
  • 127
  • 1
    Do your answer have any advantage over using double quotes instead single ones (which also solves the issue)? – Rafal Jul 22 '15 at 16:15
  • 1
    @Rafal Yes, double quotes suck as they also interpolate variables which you don't want to. – mpapec Jul 22 '15 at 18:19