0

I'm trying to change strings like this:

<a href='../Example/case23.html'><img src='Blablabla.jpg'

To this:

<a href='../Example/case23.html'><img src='<?php imgname('case23'); ?>'

And I've got this monster of a regular expression:

find . -type f | xargs perl -pi -e \
  's/<a href=\'(.\.\.\/Example\/)(case\d\d)(.\.html\'><img src=\')*\'/\1\2\3<\?php imgname\(\'\2\'); \?>\'/'

But it isn't working. In fact, I think it's a problem with Bash, which could probably be pointed out rather quickly.

r: line 4: syntax error near unexpected token `('
r: line 4: `  's/<a href=\'(.\.\.\/Example\/)(case\d\d)(.\.html\'><img src=\')*\'/\1\2\3<\?php imgname\(\'\2\'); \?>\'/''

But if you want to help me with the regular expression that'd be cool, too!

nnyby
  • 4,748
  • 10
  • 49
  • 105
  • 8
    don't parse HTML with regular expressions. use something like HTML::Parser, HTML::TreeBuilder, or HTML::TreeBuilder::XPath – xenoterracide Jul 28 '10 at 20:42
  • 5
    [Friends don't let friends parse HTML with regular expressions.](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Ether Jul 28 '10 at 20:45
  • 2
    also check-out bash heredocs http://tldp.org/LDP/abs/html/here-docs.html . Handy for long bash commands that contain quotes and other shell meta-characters. – dwarring Jul 28 '10 at 21:18

5 Answers5

2

Teaching you how to fish:

s/…/…/

Use a separator other than / for the s operator because / already occurs in the expression.

s{…}{…}

Cut down on backslash quoting, prefer [.] over \. because we'll shellquote later. Let's keep backslashes only for the necessary or important parts, namely here the digits character class.

s{<a href='[.][.]/Example/case(\d\d)[.]html'>…

Capture only the variable part. No need to reassemble the string later if the most part is static.

s{<a href='[.][.]/Example/case(\d\d)[.]html'><img src='[^']*'}{<a href='../Example/case$1.html'><img src='<?php imgname('case$1'); ?>'}

Use $1 instead of \1 to denote backreferences. [^']* means everything until the next '.

To serve now as the argument for the Perl -e option, this program needs to be shellquoted. Employ the following helper program, you can also use an alias or shell function instead:

> cat `which shellquote`
#!/usr/bin/env perl
use String::ShellQuote qw(shell_quote); undef $/; print shell_quote <>

Run it and paste the program body, terminate input with Ctrl+d, you receive:

's{<a href='\''[.][.]/Example/case(\d\d)[.]html'\''><img src='\''[^'\'']*'\''}{<a href='\''../Example/case$1.html'\''><img src='\''<?php imgname('\''case$1'\''); ?>'\''}'

Put this together with shell pipeline.

find . -type f | xargs perl -pi -e 's{<a href='\''[.][.]/Example/case(\d\d)[.]html'\''><img src='\''[^'\'']*'\''}{<a href='\''../Example/case$1.html'\''><img src='\''<?php imgname('\''case$1'\''); ?>'\''}'
daxim
  • 39,270
  • 4
  • 65
  • 132
  • That's not single quotes permitting escapes. That's "open-quote, close-quote, escape-quote (unquoted/outside of quotes), open-quote, ..." – Dennis Williamson Jul 28 '10 at 22:39
  • +1 for taking the time to explain in so much detail how to solve this problem step-by-step. Great answer! – kander Jul 29 '10 at 17:31
1

Bash single-quotes do not permit any escapes.

Try this at a bash prompt and you'll see what I mean:

FOO='\'foo'

will cause it to prompt you looking for the fourth single-quote. If you satisfy it, you'll find FOO's value is

\foo

You'll need to use double-quotes around your expression. Although in truth, your HTML should be using double-quotes in the first place.

  • Also, as others have said, regular expressions are not the best way to parse HTML. But if your case really is limited to a pattern as relatively simple as this, you can probably get away with it. – Michael Scott Shappe Jul 28 '10 at 21:13
  • Downvote: This is wrong, single quotes *do* permit escapes (see my answer for proof), and therefore double quotes are *not* needed. Double quotes are very inconvenient anyway, many Perl variables such as `$_` or `$1` are going to be interpreted as shell variables. To keep one's sanity, always `-e''`, never `-e""`. – daxim Jul 28 '10 at 21:49
  • @daxim: That's wrong, they don't. See my comment to your answer. – Dennis Williamson Jul 28 '10 at 22:40
1

Single quotes within single quotes in Bash:

set -xv
echo ''"'"''
echo $'\''
karl
  • 11
  • 1
0

I wouldn't use a one-liner. Put your Perl code in a script, which makes it much easier to get the regex right without wondering about escaping quotes and such.

I'd use a script like this:

#!/usr/bin/perl -pi

use strict;
use warnings;

s{
    ( <a \b [^>]* \b href=['"] [^'"]*/case(\d+)\.html ['"] [^>]* > \s*
      <img \b [^>]* \b src=['"] ) [^'"<] [^'"]*
}{$1<?php imgname('case$2'); ?>}gix;

and then do something like:

find . -type f | xargs fiximgs

– Michael

mscha
  • 6,509
  • 3
  • 24
  • 40
0

if you install the package mysql, it comes with a command called replace.

With the replace command you can:

while read line 
do
 X=`echo $line| replace "<a href='../Example/"  ""|replace ".html'><" " "|awk '{print $1}'`
 echo "<a href='../Example/$X.html'><img src='<?php imgname('$X'); ?>'">NewFile   
done < myfile

same can be done with sed. sed s/'my string'/'replace string'/g.. replace is just easier to work with special characters.

Adam Outler
  • 1,651
  • 4
  • 19
  • 23