2

Consider the following program (it can be compiled and run online e. g. at Javatpoint):

public class Simple
{
    public static void main(String args[])
    {
        System.out.println("A:B:C:D".replaceFirst("(?<=[^:]*:[^:]*:).", "X"));
        System.out.println("A:B:C:D".replaceFirst("(?<=(?:[^:]*:){2}).", "X"));
    }
}

The second invocation of replaceFirst throws

java.util.regex.PatternSyntaxException: Look-behind group does not have an obvious maximum length near index 16
(?<=(?:[^:]*:){2}).
                ^

- that's understandable. But the first invocation silently returns the string unchanged. Shouldn't the subexpressions [^:]*:[^:]*: and (?:[^:]*:){2} be equivalent? So what's the reason for the differing behavior?

Community
  • 1
  • 1
Armali
  • 18,255
  • 14
  • 57
  • 171
  • Java doesn't support variable length lookbehinds. – Michał Turczyn Mar 06 '20 at 07:31
  • 1
    First case looks like a bug since you also should get similar exception there, as you are not respecting limitation of Java regex engine which forbids using `*` in look-behind as it doesn't represent obvious maximal length of a word. If you know max possible length of `[^:]` like 100000 characters then use `replaceFirst("(?<=[^:]{0,100000}:[^:]{0,100000}:).", "X")` instead. Second example while logically equivalent correctly informs us about problem (although `replaceFirst("(?<=(?:[^:]{0,100000}:){2}).", "X")` also fails to work for it which seems like a bug to me). – Pshemo Mar 06 '20 at 07:56
  • 1
    @Wiktor Stribiżew - I wish you wouldn't be so trigger-happy with your question-closing. The alleged duplicate does not explain the behavioral difference of the given expressions. – Armali Mar 06 '20 at 08:28
  • @Pshemo - Interestingly even `"(?<=(.?:){2})."` throws the _PatternSyntaxException_ (tested with Java 8). – Armali Mar 06 '20 at 10:02
  • 1
    It gets better: `(?<=(.?:){1})` also throws PatternSyntaxException. It looks like if there is any possible variety of length in group (like caused by `?` `+` `*` `{m,n}`) we can't "multiply" that group (even by `{1}`) within look-behind. I was hoping regex engine to be able to calculate MAX possible length within a group for cases like `(a{1,2}bc{1,2}){3}` -> `(2+1+2)*3=5*3=15` but such analysis probably isn't as easy for more complex cases which made regex devs abandon it. Would like to know more about it... – Pshemo Mar 06 '20 at 10:35
  • 1
    Also we can't use group references within look-behind like `(?<=(.)\1)` despite knowing that `\1` will always have same length as match from group so its max length is same as max of referenced group (unless that isn't always the case but I am not aware of such scenario). – Pshemo Mar 06 '20 at 10:40

0 Answers0