1

I have a regular expression that I am trying to match on strings containing:

<script type="text/javascript">
   var debug = new Debugger();
</script>

I have determined it suffices to use the word "debug" to match on.

If I execute the command:

find . -name 'test.html' -exec perl -ne '/<script type="text\/javascript">[\S\s]*?(debug)[\S\s]*?<\/script>/ && print' '{}' \;

I would expect the regex to match, as the regex string

 <script type="text\/javascript">[\S\s]*?(debug)[\S\s]*?<\/script>

Matches on sublime text.

I have had trouble using [\S\s] with Perl. Is there something I am missing here?

Thanks

bneigher
  • 818
  • 4
  • 13
  • 24
  • Good luck with this kind of question, http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags is not so far – Gilles Quénot Nov 12 '14 at 20:48
  • There may be differences in line break characters. Did you try adding \r and \R inside the brackets? It would look like: [\S\s\r\n] – Shahar Nov 12 '14 at 20:50
  • It might help you see the cause of your issue if you replace your Perl command with something like this: `perl -nle 'print "^$_\$"'`. This will print a `^` at the beginning of each line and a `$` at the end of each line. For each line, think: "Is ` – ThisSuitIsBlackNot Nov 12 '14 at 20:51
  • Yes I have tried. I believe \S\s is inclusive of \n and \r – bneigher Nov 12 '14 at 20:52

2 Answers2

0

You want to use perl's paragraph mode (-0) when calling it. Using this, your regex will work:

find . -name 'test.html' -exec perl -n0e '/<script type="text\/javascript">[\S\s]*?(debug)[\S\s]*?<\/script>/ && print' '{}' \;

(Not?) surprisingly @sputnick gets the gold medal for this answer here ;)

Community
  • 1
  • 1
brandonscript
  • 68,675
  • 32
  • 163
  • 220
0

edit I failed to see there is a file slurp problem. But that just makes
two problems now. Consider running a Perl script instead and modify the record
separator in a scope like { $/ = undef; $data = <$file>; ... } or similar.


You know the warning about using regex on html.

A point: [\S\s] is equivalent to (?s:.) using the inline modifier.
and won't be a problem in Perl.

The non-greedy won't help, it wants to find debug in a script tag, it will
take from the first tag and match all the other tags until it finds debug then
look for a close tag.

That's the only problem that could arise. To prevent that you have to check
the contents of the script a little better.

 #  /(?s)<script\s+type="text\/javascript">(?:(?!<\/?script).)*?(debug)(?:(?!<\/?script).)*?<\/script>/

 (?s)
 <script \s+ type="text/javascript">
 (?:
      (?! </?script )
      . 
 )*?
 ( debug )
 (?:
      (?! </?script )
      . 
 )*?
 </script>