9

I know i can match numbers with Pattern.compile("\\d*");

But it doesn't handle the long min/max values.

For performence issues related to exceptions i do not want to try to parse the long unless it is really a long.

if ( LONG_PATTERN.matcher(timestampStr).matches() ) {
    long timeStamp = Long.parseLong(timestampStr);
    return new Date(timeStamp);
} else {
    LOGGER.error("Can't convert " + timestampStr + " to a Date because it is not a timestamp! -> ");
    return null;
}

I mean i do not want any try/catch block and i do not want to get exceptions raised for a long like "564654954654464654654567879865132154778" which is out of the size of a regular Java long.

Does someone has a pattern to handle this kind of need for the primitive java types? Does the JDK provide something to handle it automatically? Is there a fail-safe parsing mecanism in Java?

Thanks


Edit: Please assume that the "bad long string" is not an exceptionnal case. I'm not asking for a benchmark, i'm here for a regex representing a long and nothing more. I'm aware of the additionnal time required by the regex check, but at least my long parsing will always be constant and never be dependent of the % of "bad long strings"

I can't find the link again but there is a nice parsing benchmark on StackOverflow which clearly shows that reusing the sams compiled regex is really fast, a LOT faster than throwing an exception, thus only a small threshold of exceptions whould make the system slower than with the additionnal regex check.

Sebastien Lorber
  • 89,644
  • 67
  • 288
  • 419
  • 4
    Note that `"\\d*"` matches empty strings as well. – Bart Kiers Jun 28 '12 at 11:06
  • Possibly [your question have already been asked](http://stackoverflow.com/questions/2563608/check-whether-a-string-is-parsable-into-long-without-try-catch). In my opinion, exceptions will be faster than regular expressions. – Miki Jun 28 '12 at 11:08
  • @Sorrow: Good catch on the earlier question. Re exceptions vs. regular expressions: What makes you think that? Throwing exceptions is not a quick process. Once compiled, regular expressions are quite quick. – T.J. Crowder Jun 28 '12 at 11:10
  • 1
    As stated in my answer, it all depends on how often the exceptional case happens. With pattern matching you do the extra work _all the time_, with exceptions, the extra work is only done whenever the exception is raised. – gexicide Jun 28 '12 at 11:15
  • @Sorrow, a precompiled pattern will outperform an exception that needs to be caught. But a (string) number that will fit in a `long` would be faster to check when `Long.parseLong(...)` does not throw an exception. – Bart Kiers Jun 28 '12 at 11:19
  • 1
    I agree with @gexicide here - it all depends on how often the exceptions are caught, as the regexps are checked always, no matter what. It seems to me that combining what T.J. Crowder answered (`-?\\d{1,19}`) with exception catching should be pretty much optimal, then. – Miki Jun 28 '12 at 11:26

3 Answers3

17

The minimum avlue of a long is -9,223,372,036,854,775,808, and the maximum value is 9,223,372,036,854,775,807. So, a maximum of 19 digits. So, \d{1,19} should get you there, perhaps with an optional -, and with ^ and $ to match the ends of the string.

So roughly:

Pattern LONG_PATTERN = Pattern.compile("^-?\\d{1,19}$");

...or something along those lines, and assuming you don't allow commas (or have already removed them).

As gexicide points out in the comments, the above allows a small (in comparison) range of invalid values, such as 9,999,999,999,999,999,999. You can get more complex with your regex, or just accept that the above will weed out the vast majority of invalid numbers and so you reduce the number of parsing exceptions you get.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
3

This regular expression should do what you need:

^(-9223372036854775808|0)$|^((-?)((?!0)\d{1,18}|[1-8]\d{18}|9[0-1]\d{17}|92[0-1]\d{16}|922[0-2]\d{15}|9223[0-2]\d{14}|92233[0-6]\d{13}|922337[0-1]\d{12}|92233720[0-2]\d{10}|922337203[0-5]\d{9}|9223372036[0-7]\d{8}|92233720368[0-4]\d{7}|922337203685[0-3]\d{6}|9223372036854[0-6]\d{5}|92233720368547[0-6]\d{4}|922337203685477[0-4]\d{3}|9223372036854775[0-7]\d{2}|922337203685477580[0-7]))$

But this regexp doesn't validate additional symbols like +, L, _ and etc. And if you need to validate all possible Long values you need to upgrade this regexp.

1

Simply catch the NumberFormatException, unless this case happens very often.

Another way would be to use a pattern which only allows long literals. Such pattern might be quite complex.

A third way would be to parse the number as BigInt first. Then you can compare it to Long.MAX_VALUE and Long.MIN_VALUE to check whether it is in the bounds of long. However, this might be costly as well.

Also note: Parsing the long is quite fast, it is a very optimized method (that, for example, tries to parse two digits in one step). Applying pattern matching might be even more costly than performing the parsing. The only thing which is slow about the parsing is throwing the NumberFormatException. Thus, simply catching the exception is the best way to go if the exceptional case does not happen too often

gexicide
  • 38,535
  • 21
  • 92
  • 152
  • The OP said: *"For performence issues related to exceptions i do not want to try to parse the long unless it is really a long"* – T.J. Crowder Jun 28 '12 at 11:06
  • the idea is to get a constant execution time for the method instead of an execution time related to the presence of wrong long values. As far as i know, i've read a benchmark on SO and regexs, when not recompiled everytime, are pretty fast – Sebastien Lorber Jun 28 '12 at 12:24
  • I'm reading the [source code of `parseLong`](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/Long.java#Long.parseLong%28java.lang.String%2Cint%29), however i'm not finding any evidence of parsing two-by-two digits. – Marko Topolnik Jun 28 '12 at 13:08