1

I want to make a regular expression that can help me get rid of the following piece of code -

public class Test {
    public static void main(String[] args) {
        String test = "1026";
        int testToInt = 0;
        if(checkIfInteger(test))
            testToInt = Integer.parseInt(test);
        if(testToInt >= 1024 && testToInt <= 65535)
            System.out.println("Validity is perfect");
        else
            System.out.println("Validity is WRONG");
    }

    public static boolean checkIfInteger(String givenString) {
        boolean check = false;
        for(int i = 0; i < givenString.length(); i++) {
            if(givenString.charAt(i) >= '0' && givenString.charAt(i) >= '9')
                check = true;
            else {
                check = false;
                break;
            }
        }
        return check;
    }
}

Basically, it is checking if a String contains only numeric digits and also that its range is between 1024 to 65535.

For this purpose, I created the following regex -

"\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\b"

But there's a lot of values for which it fails. Can someone give me a smarter / correct way to do it?

Here's a test file if you would want to test your regex -

public class Test {
    public static void main(String[] args) {

        for (int i = 0; i < 1024; i++) {
            if (String
                    .valueOf(i)
                    .matches(
                            "\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\b"))
                System.out.println("Hum " + i);
        }


        for (int i = 1025; i < (int) Math.pow(2, 16); i++) {
            if (!String
                    .valueOf(i)
                    .matches(
                            "\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\b"))
                System.out.println("Hum " + i);
        }

        for (int i = 0; i < 100; i++) {
            if (String
                    .valueOf((int)Math.pow(2, 16) + i)
                    .matches(
                            "\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\b"))
                System.out.println("Hum " + i);
        }

    }
}
NatureDevil
  • 483
  • 1
  • 4
  • 14
  • Just wondering - what's wrong with the piece of code you have right now? – Aify Apr 07 '15 at 16:53
  • 1
    Smarter way: Don't use a regular expression...? – Alex A. Apr 07 '15 at 16:53
  • 2
    What's wrong with using `Integer.parseInt()` method directly, without testing first? – Rohit Jain Apr 07 '15 at 16:54
  • (And catch the `NumberFormatException` that might be thrown, of course...) – Jon Skeet Apr 07 '15 at 16:55
  • Well this always works but I was just curious if I could find a regex to replace everything haha – NatureDevil Apr 07 '15 at 16:59
  • There is nothing wrong with my current piece of code – NatureDevil Apr 07 '15 at 17:00
  • possible duplicate of [How to check if a String is a numeric type in Java](http://stackoverflow.com/questions/1102891/how-to-check-if-a-string-is-a-numeric-type-in-java) – Om. Apr 07 '15 at 17:00
  • In addition to the above comments I'd like to add that in the line `if(givenString.charAt(i) >= '0' && givenString.charAt(i) >= '9')` your condition is wrong, you want `givenString.charAt(i) >= '0' && givenString.charAt(i) <= '9'`, `<= '9'` instead of `>= '9'` – halex Apr 07 '15 at 17:07

5 Answers5

1

Change your code

from:

 testToInt = Integer.parseInt(test);
        if(testToInt >= 1024 && testToInt <= 65535)
            System.out.println("Validity is perfect");
        else
            System.out.println("Validity is WRONG");

To:

try {
      testToInt = Integer.parseInt(test);
     if(testToInt >= 1024 && testToInt <= 65535)
        System.out.println("Validity is perfect");
    else
        System.out.println("Validity is WRONG");
    }  
      catch(NumberFormatException nfe)  
   {  
      System.out.println("Validity is WRONG"); 
   }  
Om.
  • 2,532
  • 4
  • 22
  • 22
0

In Java, you need to use double escaped symbols, so after fixing this bit your regex string looks like:

String pattern = "\\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[01][0-9]|6552[0-5])\\b";

This already fixes a lot, I only get these "Hum"s:

Hum 65526                                                                                                                                                           
Hum 65527                                                                                                                                                           
Hum 65528                                                                                                                                                           
Hum 65529                                                                                                                                                           
Hum 65530                                                                                                                                                           
Hum 65531                                                                                                                                                           
Hum 65532                                                                                                                                                           
Hum 65533                                                                                                                                                           
Hum 65534                                                                                                                                                           
Hum 65535 

Now, adding |6553[0-5] I get a fully working regex:

String pattern = "\\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[012][0-9]|6552[0-5]|6553[0-5])\\b";

The example program based on your testing code is available here.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

Throwing an Exception here would IMO be a better strategy than returning a boolean.

Something like:

public int parseAndCheck(String val, int low, int high) throws IllegalArgumentException {
  try {
    int num = Integer.parseInt(val);
    if (num < low || num > high) throw new IllegalArgumentException(val);
    return num;
  }
  catch (NumberFormatException ex) {
    throw new IllegalArgumentException(ex);
  }
}
ControlAltDel
  • 33,923
  • 10
  • 53
  • 80
0
^(?:102[4-9]|10[3-9]\d|1[1-9]\d{2}|[2-9]\d{3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5])$

You can try this regex.See demo.

https://regex101.com/r/sJ9gM7/70

vks
  • 67,027
  • 10
  • 91
  • 124
0

Just because you can do this with regular expressions doesn't mean you should. Not only is it error-prone and the code pretty much unreadable, but it's quite slow.

Given code like:

var intStrings = IntStream.range(0, 70000).mapToObj(Integer::toString).toArray(String[]::new);
var badStrings = IntStream.range(0, 70000).mapToObj(x -> "not an int " + x).toArray(String[]::new);

and using the regexp from Wiktor's answer:

var re = Pattern.compile("\\b(102[4-9]|10[3-9][0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[012][0-9]|6552[0-5]|6553[0-5])\\b");

var matchCount = 0;
for (int i = 0, len = intStrings.length; i < len; i++) { 
  matchCount = re.matcher(intStrings[i]).matches() ? 1 + matchCount : matchCount;
  matchCount = re.matcher(badStrings[i]).matches() ? 1 + matchCount : matchCount;
} 

is going to take about twelve times longer than the same number of iterations of the character-checking version:

boolean valid(String s) {
  var len = s.length();
  if (len > 5) { // anything longer than this will be > 65535
    return false;
  }
  for (int i = 0; i < len; i++) {
    var c = s.charAt(i);
    if (c < '0' || c > '9') {
      return false;
    }
  }
  try {
    var intVal = Integer.parseInt(s);
    return intVal >= 1024 && intVal <= 65535;
  } catch (NumberFormatException e) {
    throw new IllegalStateException(e); // never happen
  }
}

The try/catch version, while much simpler --

boolean valid(String s) {
  try {
    var intVal = Integer.parseInt(s);
    return intVal >= 1024 && intVal <= 65535;
  }
  catch (NumberFormatException e) {
    return false;
  }
}

-- is about 450 times slower than the character-checking version, and 35 times slower than the regexp version.

That said, if you expect nearly all inputs to be valid, or if the code is not going to be called very often, try/catch is the best choice, because it's easy to read and the intent is very clear.

David Moles
  • 48,006
  • 27
  • 136
  • 235