4

I'm using a commercial closed-source Java application that, besides everything it does, allows to filter text fields by providing a regex pattern string. I'm using that filter functionality quite extensively.

The issue I'm having is that I often find myself repeating the same exact subpatterns in the regex. For example, here

^(
    ( # pattern foo
        foo_([^_]+)_(windows|linux|osx)
    )
    |
    ( # pattern bar
        ([^_]+)_bar_(windows|linux|osx)_foo_(windows|linux|osx)
    )
)$

The ([^_]+) and (windows|linux|osx) parts repeat quite often.

That's just a made up example. The original regex is more complex, about 20 times larger and has a lot of different repeats. It becomes a bit harder to read since the repeated subpatterns only keep growing in size as well as in number, and it's troublesome that when you try to modify a repeated subpattern, you have to modify all its repeats too.

So, I played with regex101 and came up with this

^(
    ( # a dummy option, defines some frequently used capture groups
        (?!x)x # always false, so nothing matches this and the following groups ever
        (?'name'[^_]+) # group "name"
        (?'os'windows|linux|osx) # group "os"
    )
    |
    ( # pattern foo
        foo_\g'name'_\g'os'
    )
    |
    ( # pattern bar
        \g'name'_bar_\g'os'_foo_\g'os'
    )
)$

regex101 save

Now all of the subpatterns are named and whenever I reference the name, they are replaced with the subpattern string (i.e. \g'os' gets replaced by (windows|linux|osx)). The names are a lot shorter than the corresponding subpattern, they also are clear and you have to modify a subpattern once for the modification to apply everywhere in the regex.

The issue with this improved version is that while it's a valid PHP pcre regex, it's invalid Java regex. Comments and broken lines in the regex aside, Java doesn't support \g, as stated in Comparison to Perl 5.

Is there any way I can "factor out" the repeated regex patterns like that in Java Regex? Don't forget that all I can do is provide a pattern string, I have no access to the code.

Cookie Cat
  • 215
  • 1
  • 2
  • 7
  • 1
    http://stackoverflow.com/a/415635/460557 – Jorge Campos Aug 14 '15 at 01:55
  • It doesn't answer my question in a slightest. It says that naming groups and using `\k` is supported, but `\g`, which is what I need, is still unsupported. – Cookie Cat Aug 14 '15 at 02:17
  • @RobbyCornelissen: Please retract your close vote. This question has nothing to do with named group. In fact Java has no support for subroutine call. – nhahtdh Aug 14 '15 at 06:21
  • 1
    @CookieCat: What you want to do can be achieved by string concatenation in Java. An example: http://stackoverflow.com/questions/26507391/java-regular-expression-for-detecting-class-interface-etc-declaration/26513446#26513446 (scroll down to bottom) – nhahtdh Aug 14 '15 at 06:23
  • 1
    @nhahtdh that is correct, except that I mentioned in the very beginning of the question that I'm a user of commercial closed-source Java application and restated it in the very end of my question saying that I don't have access to the source code of it. I need everything to be done entirely in Java's Regex. Other flavors of regex, such as Perl's, Python's, JavaScript's, PHP's and many other support the `\g` escape sequence for referencing named groups, which is what would solve my issue, but Java doesn't support it. And my question was whether what I want is possible to do in Java's Regex. – Cookie Cat Aug 14 '15 at 07:47
  • @CookieCat: Then it's impossible to do what you want, since Java doesn't support such feature in the pattern itself. – nhahtdh Aug 14 '15 at 07:49
  • @CookieCat you are totally right, sorry, no excuses – m.cekiera Aug 14 '15 at 07:53
  • 1
    @nhahtdh I see. I hoped there might be some clever workaround. It was a lot more desirable to keep it regex-only as much as possible, but since there is no way around it, I will have to resort to writing a program that will print to stdout the regex I want, using variables for that substitution I want. – Cookie Cat Aug 14 '15 at 07:57
  • `^(?:foo_${name}|${name}_bar_${os}_foo)_${os}$` where `${name}="[^_]+", ${os}="(?:windows|(?:linu|os)x)"` – sln Apr 07 '21 at 21:37

3 Answers3

0

As of Java 8 a pure regular expression solution doesn't exist. The \g may be supported in newer versions in the future.

As already mentionned, the only solution is the string concatenation technique. However it is not an option in your case.

If you tell us the name of the commercial closed-source Java application, maybe we can help you more.

Stephan
  • 41,764
  • 65
  • 238
  • 329
0

If you can run some of your java code before submitting the pattern, you could use StrSubstitutor from apache.commons:

Map<String, String> valuesMap = new HashMap<>();
valuesMap.put("os", "(windows|linux|osx)");
valuesMap.put("name", "(?[^_]+)");
StrSubstitutor sub = new StrSubstitutor(valuesMap);

String template ="^(\n"+
        "    ( # pattern foo\n"+
        "        foo_${name}_${os}\n"+
        "    )\n"+
        "    |\n"+
        "    ( # pattern bar\n"+
        "        ${name}_bar_${os}_foo_${os}\n"+
        "    )\n"+
        ")$";
String regex = sub.replace(template);
System.out.println(regex);
Eero Aaltonen
  • 4,239
  • 1
  • 29
  • 41
0

Your regex reduces to ^(?:foo_[^_]+|[^_]+_bar_(?:windows|(?:linu|os)x)_foo)_(?:windows|(?:linu|os)x)$

^ 
(?:
  foo_ [^_]+ 
| [^_]+ _bar_
  (?:
    windows
  | (?: linu | os )
    x
  )
  _foo
)
_
(?:
  windows
| (?: linu | os )
  x
)
$
sln
  • 2,071
  • 1
  • 3
  • 11