4

I was answering this question, here is a direct link to my answer.

You will notice that I used the pattern:

(\\?)?&?(TXT\\{[^}]++})(&)?

In the following code (added some more debugging related to my issue):

public static void main(final String[] args) throws Exception {
    final String[] loginURLs = {
        "http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}",
        "http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&TXT{UE-IP,UE-Username,UE-Password}",
        "http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}&username=abcd&location={LOCATION}&password={PASS}",
        "http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}",
        "http://ip:port/path?username=abcd&password={PASS}"};
    final Pattern patt = Pattern.compile("(\\?)?&?(TXT\\{[^}]++})(&)?");
    for (final String loginURL : loginURLs) {
        System.out.printf("%1$-10s %2$s%n", "Processing", loginURL);
        final StringBuffer sb = new StringBuffer();
        final Matcher matcher = patt.matcher(loginURL);
        while (matcher.find()) {
            final String found = matcher.group(2);
            System.out.printf("%1$-10s 1:%2$s,3:%3$s%n", "Groups", matcher.group(1), matcher.group(3));
            System.out.printf("%1$-10s %2$s%n", "Found", found);
            if (matcher.group(1) != null && matcher.group(3) != null) {
                matcher.appendReplacement(sb, "$1");
            } else {
                matcher.appendReplacement(sb, "$3");
            }
        }
        matcher.appendTail(sb);
        System.out.printf("%1$-10s %2$s%n%n", "Processed", sb.toString());
    }
}

Of which the output is:

Processing http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}
Groups     1:null,3:&
Found      TXT{UE-IP,UE-Username,UE-Password}
Processed  http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}

Processing http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}&TXT{UE-IP,UE-Username,UE-Password}
Groups     1:null,3:null
Found      TXT{UE-IP,UE-Username,UE-Password}
Processed  http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}

Processing http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}&username=abcd&location={LOCATION}&password={PASS}
Groups     1:?,3:&
Found      TXT{UE-IP,UE-Username,UE-Password}
Processed  http://ip:port/path?username=abcd&location={LOCATION}&password={PASS}

Processing http://ip:port/path?TXT{UE-IP,UE-Username,UE-Password}
Groups     1:?,3:null
Found      TXT{UE-IP,UE-Username,UE-Password}
Processed  http://ip:port/path

Processing http://ip:port/path?username=abcd&password={PASS}
Processed  http://ip:port/path?username=abcd&password={PASS}

Which is perfect.

Now, my issue

When I change the first match group, (\\?)?, to use a possessive quantifier, i.e. (\\?)?+, the output for the first item becomes:

Processing http://ip:port/path?username=abcd&location={LOCATION}&TXT{UE-IP,UE-Username,UE-Password}&password={PASS}
Groups     1:?,3:&
Found      TXT{UE-IP,UE-Username,UE-Password}
Processed  http://ip:port/path?username=abcd&location={LOCATION}?password={PASS}

I cannot for the life of be work out where the question mark in the first match group comes from.

I don't see a way for the pattern to correctly match the required string and grab a question mark in the first group.

Am I just missing something obvious?

If it matters I am running OS X Mavericks with:

java version "1.8.0"
Java(TM) SE Runtime Environment (build 1.8.0-b132)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)
Community
  • 1
  • 1
Boris the Spider
  • 59,842
  • 6
  • 106
  • 166

1 Answers1

3

I guess, this is to do with how possessive quantifiers work. First they work like greedy quantifier. In the sense, they will try to match as much as they can. But unlike greedy quantifier, once they match something, they won't give up the match after backtracking.

So, taking your regex:

"(\\?)?+&?(TXT\\{[^}]++})(&)?"

It first finds the ? before username, so it matches that and stores it in group 1. Then it finds that the next character & doesn't match the u of username. So it backtracks, and stops at ?. Since that was matched as a possessive quantifier, they don't loose the match.

Now, it proceeds further. At this point, group 1 still contains the ?. Now it matches the part:

&TXT{UE-IP,UE-Username,UE-Password}&

Where since ? is optional, it is not matched. But it doesn't replace anything in group 1.

That means, you're getting the ? from the group 1 that was matched first time.


This seems to be a bug in Java regex engine, as in Perl, that group is coming as undefined. Here's the fiddle.

Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
  • Is that the desired behaviour of a possessive quantifier - as you point out that `?` is from an entirely different, failed, match. Shouldn't the state of the engine be cleared after a failed match attempt? – Boris the Spider Mar 21 '14 at 12:05
  • 1
    I agree with @BoristheSpider, this is a bug; since there isn't a match, to begin with, the capturing group should not have retained the text. – fge Mar 21 '14 at 12:06
  • @BoristheSpider Well, just tested this on Perl, and it worked as expected. Certainly a bug with Java regex engine. – Rohit Jain Mar 21 '14 at 12:32