7

I'm pretty sure this hasn't actually been answered yet on this site. For once and for all, what is the smallest regex that matches a numeric string that is in the range of a 32-bit signed integer, in the range -2147483648 to 2147483647.

I must use regex for validation - that is the only option available to me.

I have tried

\d{1,10}

but I can't figure out how to restrict it to the valid number range.


To aid developing in regex, it should match:

-2147483648
-2099999999
-999999999
-1
0
1
999999999
2099999999
2147483647

It should not match:

-2147483649
-2200000000
-11111111111
2147483648
2200000000
11111111111

I have set up an on-line live demo (on rubular) that has my attempt and the test cases above.


Note: The shortest regex that works will be accepted. Efficiency of regex will not be considered (unless there's a tie for shortest length).

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • can anyone guied me for the 16 bit regex for javascipt? What I tried doesnt work. `^-?([0-9]{1,9}|[0-1][0-9]{4}|3[0-1][0-9]{3}|32[0-6][0-9]{2}|327[0-5]|3276[0-7])$|^(−32768)$` – Shachi Jun 20 '19 at 12:13
  • @shachi Here's the 16-bit version: `^-?(\d{1,4}|[12]\d{4}|3([01]\d{3}|2([0-6]\d{2}|7([0-5]\d|6[0-7]))))$|^-32768$`. See [live demo](https://rubular.com/r/F5zKjIJNc83Kci). – Bohemian Jun 21 '19 at 00:53
  • Thanku @Bohemian You are great!! Learning how to make regex has been added on my todo list.. – Shachi Jun 21 '19 at 05:58

4 Answers4

7

I really hope it is just puzzler and no one will use regex for this problem in real world. Proper solution would be converting number from string to numeric type like BigInteger. This should allow us to check its range using proper methods or operators, like compareTo, >, <.


To make life easier you can use this page (dead link) to generate regex for ranges. So regex for range 0 - 2147483647 can look like

\b([0-9]{1,9}|1[0-9]{9}|2(0[0-9]{8}|1([0-3][0-9]{7}|4([0-6][0-9]{6}|7([0-3][0-9]{5}|4([0-7][0-9]{4}|8([0-2][0-9]{3}|3([0-5][0-9]{2}|6([0-3][0-9]|4[0-7])))))))))\b

(friendlier way)

\b(
 [0-9]{1,9}|
1[0-9]{9}|
2(0[0-9]{8}|
  1([0-3][0-9]{7}|
       4([0-6][0-9]{6}|
            7([0-3][0-9]{5}|
                 4([0-7][0-9]{4}|
                      8([0-2][0-9]{3}|
                           3([0-5][0-9]{2}|
                                6([0-3][0-9]|
                                     4[0-7]
)))))))))\b

and range 0 - 2147483648

\b([0-9]{1,9}|1[0-9]{9}|2(0[0-9]{8}|1([0-3][0-9]{7}|4([0-6][0-9]{6}|7([0-3][0-9]{5}|4([0-7][0-9]{4}|8([0-2][0-9]{3}|3([0-5][0-9]{2}|6([0-3][0-9]|4[0-8])))))))))\b

So we can just combine these ranges and write it as

range of 0-2147483647 OR "-" range of 0-2147483648

which will give us

\b([0-9]{1,9}|1[0-9]{9}|2(0[0-9]{8}|1([0-3][0-9]{7}|4([0-6][0-9]{6}|7([0-3][0-9]{5}|4([0-7][0-9]{4}|8([0-2][0-9]{3}|3([0-5][0-9]{2}|6([0-3][0-9]|4[0-7])))))))))\b|-\b([0-9]{1,9}|1[0-9]{9}|2(0[0-9]{8}|1([0-3][0-9]{7}|4([0-6][0-9]{6}|7([0-3][0-9]{5}|4([0-7][0-9]{4}|8([0-2][0-9]{3}|3([0-5][0-9]{2}|6([0-3][0-9]|4[0-8])))))))))\b.

[edit]

Since Bohemian noticed in his comment final regex can be in form -?regex1|-2147483648 so here is little shorter version (also changed [0-9] to \d)

^-?(\d{1,9}|1\d{9}|2(0\d{8}|1([0-3]\d{7}|4([0-6]\d{6}|7([0-3]\d{5}|4([0-7]\d{4}|8([0-2]\d{3}|3([0-5]\d{2}|6([0-3]\d|4[0-7])))))))))$|^-2147483648$

If you will use it in Java String#matches(regex) method on each line you can also skip ^ and $ parts since they will be added automatically to make sure entire string matches regex.

I know this regex is very ugly, but just shows why regex is not good tool for range validation.

Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • I was expecting ugly (but not that ugly!), but well done. Are you sure it needs to be that long (real question, not a hint)? Also you can assume it's the *whole* input being validated (you can leave off `\b` etc). And merry Xmas too! – Bohemian Dec 27 '13 at 02:44
  • @Bohemian I didn't thought of making it shorter but it seems that using `|` in groups is only way to optimize it (and it is done now). You can do some cosmetic changes like replacing `[0-9]` with `\d` but that will not make it lot better. Maybe instead of using `regex1|-regex2` you can change last part of `regex1` to something like `[0-7]|(?<=-\d{9})8` to let it accept `8` if number starts with `-`. This way you can make your regex shorter by removing `|-regex2`, but on the same time it will be slower. – Pshemo Dec 27 '13 at 02:51
  • Because it is symmetric except for one case, could you not just do `-?regex1|-2147483648`? Also not interested in efficiency, only shortest regex. – Bohemian Dec 27 '13 at 03:07
  • @Bohemian Yes. That should work. Like I said, I wasn't trying to make it nicer because I don't want to encourage anyone to go with regex with this problem or support only regex validation. – Pshemo Dec 27 '13 at 03:21
  • @Pshemo That's why it's bad to generate regex rather than do it your self. there is alot of duplicates and parenthesis that you can remove. you don't have to split the range (-2147483647 to 2147483647) as it could be a `-` signe or not, you only seperate the number -2147483648 – rullof Dec 27 '13 at 12:59
  • @rullof parenthesis makes this regex shorter because I don't have to add all prefix numbers in each case, but just one. Take a look at friendlier version of my regex. Its last case looks like `4[0-7]` instead of `214748364[0-7]` so it saved me few characters, and that is just one case. – Pshemo Dec 27 '13 at 13:17
  • Why is this improper for real-world use? Is it too inefficient? I just want it to parse user input and check if it's valid. – Aykhan Hagverdili Apr 28 '21 at 15:11
  • @AyxanHaqverdili Imagine now someone wants to change the accepted range. Can you easily modify that regex? Probably no. That is why it is very impractical. It is much better to parse text to some numeric type and use received value like `if (val>= MIN && val<=MAX)`. – Pshemo Apr 28 '21 at 15:14
  • @Pshemo That's a valid point. I ended up limiting the number of digits to 9 instead of 10, thus users can't overflow an int. Also, it happens that numbers in billions doesn't make sense in my particular case. – Aykhan Hagverdili Apr 29 '21 at 08:31
1

Edit:

This is the shortest regex you can get and the best way to do it:

We check every digit starting from the left, if it reaches it's limit and all the previous did, we put control on the next one.

for the range (-2147483647 to 2147483647) it could be a - signe or not. for -2147483648 it must be a - signe.

So finaly we get this:

^-?([0-9]{1,9}|[0-1][0-9]{9}|20[0-9]{8}|21[0-3][0-9]{7}|214[0-6][0-9]{6}|2147[0-3][0-9]{5}|21474[0-7][0-9]{4}|214748[0-2][0-9]{3}|2147483[0-5][0-9]{2}|21474836[0-3][0-9]|214748364[0-7])$|^(-2147483648)$

And this is a Live Demo

rullof
  • 7,124
  • 6
  • 27
  • 36
0
 ^(429496729[0-6]|42949672[0-8]\d|4294967[01]\d{2}|429496[0-6]\d{3}|42949[0-5]\d{4}|4294[0-8]\d{5}|429[0-3]\d{6}|42[0-8]\d{7}|4[01]\d{8}|[1-3]\d{9}|[1-9]\d{8}|[1-9]\d{7}|[1-9]\d{6}|[1-9]\d{5}|[1-9]\d{4}|[1-9]\d{3}|[1-9]\d{2}|[1-9]\d|\d)$

Kindly try this i tested randomly not thoroughly.

0

only for the numbers above zero. add '-' and adjust last number pattern for negative numbers.

(^\d{1,9}$|^1\d{9}$|^20\d{8}$|^21[0-3]\d{7}$|^214[0-6]\d{6}$|^2147[0-3]\d{5}$|^21474[0-7]\d{4}$|^214748[0-2]\d{3}$|^2147483[0-5]\d{2}$|^21474836[0-3]\d$|^214748364[0-7]$)

one should never use regex for this type of work.

oyss
  • 662
  • 1
  • 8
  • 20