1

[NOTE: I rewrite my post for better describing my question with my thanks to mario and I'L'I answered previously]

I want to match these patterns (and also allow a number of whitespaces in-between):

Connection variable = new DBConnection
variable = new DBConnection

but NOT match:

//Connection variable = new DBConnection
//variable = new DBConnection
//    Connection variable = new DBConnection
//    variable = new DBConnection

and lastly capture the variable name.

This is my regexp

#(?<!//)(?:\s*Connection\s+)+(.+?)\s*=\s*new\s+DBConnection#

but the last two lines in not-match example list still match. How can I fix my regex? Is it because negative lookbehind must check things immediately before some fixed-length string only?

Scott Chu
  • 972
  • 14
  • 26
  • Can you provide an example of what it should match? – l'L'l Jun 08 '18 at 03:40
  • 2
    You made the match on `(Connection)?` optional. – mario Jun 08 '18 at 03:43
  • @I'L'i : I've modified my post to reply to your questions – Scott Chu Jun 08 '18 at 04:09
  • @mario: I did it on purpose. Pls. see my reply to I'L'l in my post – Scott Chu Jun 08 '18 at 04:09
  • The `(.+)` is going to match **whatever** precedes the equal sign. – mario Jun 08 '18 at 04:27
  • Your regex does not match `variable = new DBConnection` because you require `Connection\s+` to be before a required `\s*=\s*new\s+DBConnection`, what is the rule to match it? Sorry, but matching something not preceded with something of unknown length in PHP regex is solved with a bit of common programming logic that depends on whether you are extracting or replacing. What are you doing? (Also, that requires a pattern that matches what you need, and your regex does not seem to work). – Wiktor Stribiżew Jun 11 '18 at 07:20
  • @wiktor: the trailing + in (?:\s*Connection\s+)+ means optional – Scott Chu Jun 12 '18 at 16:23
  • You may make anything optional using `?` or `*` quantifiers, not `+`. Probably you want [`^(\s*//)?(?:\s*Connection\s+)?(.+?)\s*=\s*new\s+DBConnection`](https://regex101.com/r/QU9SLs/1) and fail al matches where Group 1 is not empty. Try it like [this code on your side](https://ideone.com/XvqxWv). – Wiktor Stribiżew Jun 12 '18 at 17:02
  • @Wiktor: OMG! You are right. Maybe it's the problem. I'll try it. – Scott Chu Jun 13 '18 at 02:26
  • @Wiktor: You regex is smart. I never come up with the-other-way idea. Please put it as an answer so I can mark it. Thanks! (Though I still don't know how to use negative lookbehind) – Scott Chu Jun 13 '18 at 02:48
  • @Wiktor: There's a problem. My program always read in an entire source program file into a string and do preg_match. I remove the begin line ^ from regex. It seems works fine now. – Scott Chu Jun 13 '18 at 04:11
  • Ok, let me post an answer with explanations since now it is clear. – Wiktor Stribiżew Jun 13 '18 at 06:55

1 Answers1

1

You may use one of the two approaches.

Approach 1: SKIP-FAIL regex

You may match all lines that start with // and skip them, and only match your substrings in other contexts.

'~^(\s*//.*)(*SKIP)(*F)|^(?:\s*Connection\s+)?(.+?)\s*=\s*new\s+DBConnection~m'

See the regex demo

PHP demo:

$re = '~^(\s*//.*)(*SKIP)(*F)|^(?:\s*Connection\s+)?(.+?)\s*=\s*new\s+DBConnection~m';
$str = "Connection variable = new DBConnection\n    variable = new DBConnection\n    //\n    //Connection variable = new DBConnection\n    //variable = new DBConnection\n    //    Connection variable = new DBConnection\n    //    variable = new DBConnection";
if (preg_match_all($re, $str, $matches)) {
    print_r($matches[0]);
}

Output:

Array
(
    [0] => Connection variable = new DBConnection
    [1] =>     variable = new DBConnection
)

Approach 2: Optional capturing group and a bit of post-processing

In PHP PCRE regex patterns, you cannot use infinite-width lookbehinds meaning the patterns inside cannot be quantified with *, +, *?, +?, ?, ?, {1,4}, {3,} quantifiers. Moreover, you cannot use nested alternation either.

A usual workaround is to use an optional capturing group and check its value after a match is found. If the group value is not empty, it means the match should be "failed", discarded, else, grab the capture you need.

Here is an example regex:

'~^(\s*//)?(?:\s*Connection\s+)?(.+?)\s*=\s*new\s+DBConnection~m'

See the regex demo:

enter image description here

The green highlighted substrings are Group 1 matches. We can check them in the code like this:

$result = "";                    // Result is empty
if (preg_match($rx, $s, $m)) {   // Is there a match?
    if (empty($m[1])) {          // Is the match group #1 empty?
        $result = $m[0];         // If yes, we found a result
    }
}                                // Else, result will stay empty

See the PHP demo:

$strs = ['Connection variable = new DBConnection', 'variable = new DBConnection', '//Connection variable = new DBConnection', '//variable = new DBConnection'];
$rx = '~^(\s*//)?(?:\s*Connection\s+)?(.+?)\s*=\s*new\s+DBConnection~m';
foreach ($strs as $s) {
    echo "$s:\n";
    if (preg_match($rx, $s, $m)) {
        if (empty($m[1])) {
            echo "FOUND:" . $m[0] . "\n--------------\n";
        }
    } else {
        echo "NOT FOUND\n--------------\n";
    }
}

Output:

Connection variable = new DBConnection:
FOUND:Connection variable = new DBConnection
--------------
variable = new DBConnection:
FOUND:variable = new DBConnection
--------------
//Connection variable = new DBConnection:
//variable = new DBConnection:

Same technique can be used with preg_replace_callback if you need to replace.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Though my original intention is to use negative lookbehind to exclude line comment statements but I can't get it right. However, an answer that can solve problem is a good answer. Thanks for Wiktor giving me a different-idea answer. – Scott Chu Jun 13 '18 at 08:39
  • @ScottChu Ok, on second thought, you can do something different. I will update with another soluition. – Wiktor Stribiżew Jun 13 '18 at 08:49