0

I'm trying to find all occurrences of some code unless that code is preceded by a comment.

Here's an example of what I want to find:

$page_content .= '<meta http-equiv="refresh"

or

$page_content .= 'Some other text here</p><meta http-equiv="refresh"

With or without preceding white space. Here's what I want to ignore

//$page_content .= '<meta http-equiv="refresh"

again with or without preceding white space.

That way I can be sure that my code base never contains this code unless it's in a comment or set up an automatic alert if it is found without getting false alerts for when its commented out (ignore multi-line comments for now).

I've tried using look behind

(?<!\/\/).*<meta http-equiv="refresh"

but I've not had much luck as this still matches every occurrence, commented or not.

One more thing: It would be great if it was in one regex rather in a loop of code so that I can search in Notepad++ or other editor that supports reg exp searches. (Its amazing how differently one question can be read/understood. I'd thought I'd been pretty clear but from the variety of completely valid answers its clear that I could have included a lot more detail :-)

Curious User
  • 39
  • 1
  • 6
  • consider you are using the greedy operator `.*` – John Doe Sep 02 '16 at 14:56
  • `#^[^/]{2}.*#m` can be a solution – JustOnUnderMillions Sep 02 '16 at 14:58
  • `/^(?!\/\/)/` have you tryed with the negative match? this will ignore everything that starts with `//` – John Doe Sep 02 '16 at 14:59
  • What about multi-line comments? If you don't care about that, I think you could use the inverse of this answer: http://stackoverflow.com/questions/32462878/how-do-i-remove-only-javascript-comments-that-start-with/32467204#32467204 ... `^\h*//.*$(*SKIP)(*FAIL)|.*` – chris85 Sep 02 '16 at 15:14

2 Answers2

0

Just remove the comment before checking for the string

while ( <$fh> ) {

    s|//.*||;

    if ( /<meta http-equiv="refresh"/ ) {
        ...;
    }
}
Borodin
  • 126,100
  • 9
  • 70
  • 144
-1

For the input specified in question :

//$page_content .= '<meta http-equiv="refresh"

This will do the thing:

use strict;

use warnings;


open my $fh, "<", "my_path\\data.txt";

while ( my $line = <$fh>) {
    if ( $line =~ /^(?!\/\/).*?<meta http-equiv=\"refresh\"/){
        print $line;
    }
}

If you have more spaces or other indent operators, use a look-behind operator: use strict;

use warnings;


open my $fh, "<", "c:\\users\\uidp7702\\desktop\\data.txt";

while ( my $line = <$fh>) {
    if ( $line =~ /(?<!\/\/)\$page_content\s.=\s\'.*?<meta http-equiv=\"refresh\"/){
        print $line;
    }
}
John Doe
  • 1,058
  • 8
  • 30
  • thanks. This appears to work on lines with no indent or preceding whitespace. However if its indented then it matches all, both commented and uncommented. So I removed the start of line ^ and it still matches all... so I readded the ^ with whitespace as ^\s* and ^ * but still matches all. – Curious User Sep 02 '16 at 15:36
  • Then you should have specified in your question all the variants of your input – John Doe Sep 05 '16 at 05:13
  • I've updated the question to reflect this. My thoughts were that as this was a regex it should match anywhere in the line, with preceding whitespace or with whitespace afterwards, or with other code before or after. Rather than an exact match for the example line I'd given – Curious User Sep 05 '16 at 08:42
  • You cannot use negative look-behind like `.* (negative look-behind) .* ` – John Doe Sep 05 '16 at 09:46
  • So you either make a 2 step regex, either use @borodin 's answer – John Doe Sep 05 '16 at 09:46