55

I have a little program allowing users to type-in some regular expressions. afterwards I like to check if this input is a valid regex or not.

I'm wondering if there is a build-in method in Java, but could not find such jet.

Can you give me some advice?

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
Philipp Andre
  • 997
  • 3
  • 11
  • 18
  • 3
    Why not just instantiate in a try/catch? – Oskar Kjellin Apr 24 '10 at 14:50
  • This may be an very ignorant question, I only know as much regex as I've needed this far, but isn't it pretty hard to create an invalid regex? I'm not talking about an incorrect one, but one that actually throws an error? If anyone has an example I'd love some enlightenment here – Nick Craver Apr 24 '10 at 14:55
  • for example (?<!ABCD|ABC)@[a-z]+ is invalid for the Java Regex engine, because it does ONLY allow a fix length in lookups. ABCD and ABC differ, so this is an invalid regex. – Philipp Andre Apr 24 '10 at 15:02
  • 4
    @Nick Craver: `")"`, `"]"`, `"}"`, `"?"`, `"*"`, `"+"`, all of those are obviously invalid (unmatched and dangling metacharacters). There's also things like `"x{5,-3}"`. Plenty of patterns are invalid. – polygenelubricants Apr 24 '10 at 15:19
  • 1
    @polygenelubricants - Ah that makes sense, thank you! @Philipp - Is that an invalid regex, or just won't find anything useful, but is technically correct? In testing here it seems valid, even if not particularly useful, am I missing something? – Nick Craver Apr 24 '10 at 15:30
  • jap sorry, i missed the point with my own example. of course this is a valid regex. like discussed below java can handle different but finite length in lookforwards and lookbehinds, but not plus and star notations, which allow infinite length. an example would have been simply "***" – Philipp Andre Apr 24 '10 at 18:52
  • 1
    "lookahead" would have been the correct term ;) – Philipp Andre Apr 24 '10 at 19:07

8 Answers8

80

Here is an example.

import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexTester {
    public static void main(String[] arguments) {
        String userInputPattern = arguments[0];
        try {
            Pattern.compile(userInputPattern);
        } catch (PatternSyntaxException exception) {
            System.err.println(exception.getDescription());
            System.exit(1);
        }
        System.out.println("Syntax is ok.");
    }
}

java RegexTester "(capture" then outputs "Unclosed group", for example.

Joe Corneli
  • 642
  • 1
  • 6
  • 18
laz
  • 28,320
  • 5
  • 53
  • 50
16

You can just Pattern.compile the regex string and see if it throws PatternSyntaxException.

    String regex = "***";
    PatternSyntaxException exc = null;
    try {
        Pattern.compile(regex);
    } catch (PatternSyntaxException e) {
        exc = e;
    }
    if (exc != null) {
        exc.printStackTrace();
    } else {
        System.out.println("Regex ok!");
    }

This one in particular produces the following output:

java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
***
^

Regarding lookbehinds

Here's a quote from the old trusty regular-expressions.info:

Important Notes About Lookbehind

Java takes things a step further by allowing finite repetition. You still cannot use the star or plus, but you can use the question mark and the curly braces with the max parameter specified. Java recognizes the fact that finite repetition can be rewritten as an alternation of strings with different, but fixed lengths.

I think the phrase contains a typo, and should probably say "different, but finite lengths". In any case, Java does seem to allow alternation of different lengths in lookbehind.

    System.out.println(
        java.util.Arrays.toString(
            "abracadabra".split("(?<=a|ab)")
        )
    ); // prints "[a, b, ra, ca, da, b, ra]"

There's also a bug in which you can actually have an infinite length lookbehind and have it work, but I wouldn't rely on such behaviors.

    System.out.println(
        "1234".replaceAll(".(?<=(^.*))", "$1!")
    ); // prints "1!12!123!1234!"
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • Jap exactly what i'm looking for. Thank you! I'm surprised, my example with differing lookup length passes this test. is the java regex engine now able to handle that?! – Philipp Andre Apr 24 '10 at 15:23
  • @Philipp: added things about lookbehinds. Check out the second example =) – polygenelubricants Apr 24 '10 at 15:43
  • I was looking for that on regular-expressions.info too. Good find! – laz Apr 24 '10 at 15:46
  • 2
    No, that's not a typo. He's saying *each* (theoretical) alternative has a fixed length. If they weren't fixed, you wouldn't be able to determine which one was longest. – Alan Moore Apr 24 '10 at 18:13
  • jap, i was wrong with my example in the comment somewhere at the top of this discussion. polygenelubricants definitely got the point! java allows differing length, but not star or plus notation in lookforward/lookbehind. his code example works for me as well! – Philipp Andre Apr 24 '10 at 18:49
5

try this :

import java.util.Scanner;
import java.util.regex.*;

public class Solution
{
      public static void main(String[] args){
      Scanner in = new Scanner(System.in);
      int testCases = Integer.parseInt(in.nextLine());
      while(testCases>0){
        String pattern = in.nextLine();
        if(pattern != null && !pattern.equals("")){
            try{
                Pattern.compile(pattern);
                System.out.println("Valid");
            }catch(PatternSyntaxException e){
                System.out.println("Invalid");
            }
        }
        testCases--;
        //Write your code
     }
  }
 }

use input to test :
3
([A-Z])(.+)
[AZa-z
batcatpat(nat

maddy
  • 129
  • 1
  • 4
3

Most obvious thing to do would be using compile method in java.util.regex.Pattern and catch PatternSyntaxException

String myRegEx;
...
...
Pattern p = Pattern.compile(myRegEx);

This will throw a PatternSyntaxException if myRegEx is invalid.

ring bearer
  • 20,383
  • 7
  • 59
  • 72
3
public class Solution {
    public static void main(String[] args){
        Scanner in = new Scanner(System.in);
        int testCases = Integer.parseInt(in.nextLine());
        while(testCases>0){
            String pattern = in.nextLine();
            try{
                Pattern.compile(pattern);
                System.out.println("Valid");
            }catch(PatternSyntaxException exception){
                System.out.println("Invalid");
            }
            testCases--;

        }
    }
}
1

Can you give me some advice?

There is an Elephant in the room that nobody has mentioned. Mere regex syntactic correctness is probably not sufficient. Validation using Pattern.compile is not enough.

  1. Syntax checking doesn't check that the regex does the right thing.
  2. Syntax checking doesn't check that the regex is not harmful.

It is not difficult to accidentally or deliberately create a regex that will take an (effectively) infinite time to execute. Especially if the data being searched is pathological for the regex. So allowing users to enter regexes provides a vector for "denial of service" attacks.

If you want more information on this problem:


In short, if you let users plug regexes into your application, consider the potential consequences. Especially if the users could do stupid or malicious things.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Is there a way (e.g. a library) to sanitize the regex provided by the user? Virtually all advanced text editors provide the ability to search by regex, so it seems to be possible to provide this kind of functionality to the users. Thanks. – Robert Hume Mar 20 '23 at 07:59
  • Possibly yes. Search for "redos checker" – Stephen C Mar 20 '23 at 08:14
-1
 public class Solution
 {
 public static void main(String[] args){
  Scanner in = new Scanner(System.in);
  int testCases = Integer.parseInt(in.nextLine());
  while(testCases>0){
     String pattern = in.nextLine();
      try
      {
          Pattern.compile(pattern);
      }
      catch(Exception e)
      {
         // System.out.println(e.toString());
          System.out.println("Invalid");
      }
      System.out.println("Valid");
    }
 }
}
saigopi.me
  • 14,011
  • 2
  • 83
  • 54
-1

new String().matches(regEx) can be directly be used with try-catch to identify if regEx is valid.

boolean isValidRegEx = true;
try {
    new String().matches(regEx);
} catch(PatternSyntaxException e) {
    isValidRegEx = false;
}
  • While this does accomplish the end result, Pattern.compile(regEx) is simpler (and is exactly what will end up happening anyway) and doesn't have any additional complexity. – Charles Hasegawa Aug 16 '21 at 18:26