11

When parsing a YYYYMMdd date, e.g. 20120405 for 5th April 2012, what is the fastest method?

int year = Integer.parseInt(dateString.substring(0, 4));
int month = Integer.parseInt(dateString.substring(4, 6));
int day = Integer.parseInt(dateString.substring(6));

vs.

int date = Integer.parseInt(dateString)
year = date / 10000;
month = (date % 10000) / 100; 
day = date % 100;

mod 10000 for month would be because mod 10000 results in MMdd and the result / 100 is MM

In the first example we do 3 String operations and 3 "parse to int", in the second example we do many things via modulo.

What is faster? Is there an even faster method?

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
user3001
  • 3,437
  • 5
  • 28
  • 54
  • I would imagine that the modulo math would be much faster than allocating the three (sub) strings... – Dilum Ranatunga Apr 04 '12 at 15:13
  • 7
    Why couldn't you write your own micro benchmark and see which one is faster? – maerics Apr 04 '12 at 15:14
  • @DilumRanatunga I believe there are ways of taking a substring that shares the underlying array. Many languages don't do it (not by default, at least) because it can lead to leaks, but it's perfect for use cases like this. –  Apr 04 '12 at 15:16
  • 4
    Why don't you just measure it yourself? – BalusC Apr 04 '12 at 15:22
  • Dillum, creating a substring in Java is O(1) (at least in Sun's implementation) and involves no copying. But this is better done with an actual date parser. The code won't be a performance problem in either case (at least it's unlikely). – Joey Apr 04 '12 at 15:31
  • PS I was just being curious :) Thanks for all the answers – user3001 Apr 04 '12 at 15:36

6 Answers6

33
SimpleDateFormat format = new SimpleDateFormat("yyyyMMdd");
Date date = format.parse("20120405");
thedude19
  • 2,643
  • 5
  • 34
  • 43
  • 1
    +1 for the right way to do this. – Louis Wasserman Apr 04 '12 at 15:15
  • 7
    This is the way to go, parsing a date string should require performance optimization unless you've determined that you're doing this like >10 million times in a loop for every request or some such... (In which case, you should wonder why). – Java Drinker Apr 04 '12 at 15:19
  • 1
    This is a classic example of know your tools. – Corv1nus Apr 04 '12 at 15:23
  • 1
    The Java date API is often too slow. – user3001 Apr 04 '12 at 15:30
  • 1
    @user3001 Out of curiosity, when have you found it too slow? It's not the best designed API (understatement) but, I've used it for years without performance issues. – Corv1nus Apr 04 '12 at 15:51
  • Take a look at the GregorianCalendar mess. I wrote my own Date class and it is over 300 times faster if I remember the numbers correctly :) So I try to avoid the rest of the api, too, because it might be equally worse performant. – user3001 Apr 04 '12 at 21:45
  • This is quite slow and it's noticeable in loops far smaller than 10M – ytoledano May 19 '15 at 13:30
  • Also, SimpleDateFormat is not reentrant so this won't work if it is used by multiple threads unless you create a `ThreadLocal` – Gray Oct 09 '16 at 14:17
14

As you see below, the performance of the date processing only is relevant when you look at millions of iterations. Instead, you should choose a solution that is easy to read and maintain.

Although you could use SimpleDateFormat, it is not reentrant so should be avoided. The best solution is to use the great Joda time classes:

private static final DateTimeFormatter DATE_FORMATTER = new DateTimeFormatterBuilder()
     .appendYear(4,4).appendMonthOfYear(2).appendDayOfMonth(2).toFormatter();
...
Date date = DATE_FORMATTER.parseDateTime(dateOfBirth).toDate();

If we are talking about your math functions, the first thing to point out is that there were bugs in your math code that I've fixed. That's the problem with doing by hand. That said, the ones that process the string once will be the fastest. A quick test run shows that:

year = Integer.parseInt(dateString.substring(0, 4));
month = Integer.parseInt(dateString.substring(4, 6));
day = Integer.parseInt(dateString.substring(6));

Takes ~800ms while:

int date = Integer.parseInt(dateString);
year = date / 10000;
month = (date % 10000) / 100; 
day = date % 100;
total += year + month + day;

Takes ~400ms.

However ... again... you need to take into account that this is after 10 million iterations. This is a perfect example of premature optimization. I'd choose the one that is the most readable and the easiest to maintain. That's why the Joda time answer is the best.

Gray
  • 115,027
  • 24
  • 293
  • 354
5

I did a quick benchmark test where both methods were executed 1 million times each. The results clearly show that the modulo method is much faster, as Dilum Ranatunga predicted.

t.startTiming();
for(int i=0;i<1000000;i++) {
    int year = Integer.parseInt(dateString.substring(0, 4));
    int month = Integer.parseInt(dateString.substring(4, 6));
    int day = Integer.parseInt(dateString.substring(6));
}
t.stopTiming();
System.out.println("First method: "+t.getElapsedTime());

Time t2 = new Time();
t2.startTiming();
for(int i=0;i<1000000;i++) {
    int date = Integer.parseInt(dateString);
    int y2 = date / 1000;
    int m2 = (date % 1000) / 100;
    int d2 = date % 10000;
}
t2.stopTiming();
System.out.println("Second method: "+t2.getElapsedTime());

The results don't lie (in ms).

First method: 129
Second method: 53
Honoki
  • 461
  • 2
  • 12
3

The second will certainly be faster, once you change mod to % and add missing semicolons and fix the divisor in the year calculation. That said, I'm finding it hard to picture the application where this is a bottleneck. Just how many times are you parsing YYYYMMdd dates into their components, without any need to validate them?

ruakh
  • 175,680
  • 26
  • 273
  • 307
  • 2
    +1 - for pointing out that that the OP is probably wasting his time looking for the fastest solution. – Stephen C Apr 04 '12 at 15:37
3

How about (but it would parse an invalid date without saying anything...):

public static void main(String[] args) throws Exception {
    char zero = '0';
    int yearZero = zero * 1111;
    int monthAndDayZero = zero * 11;
    String s = "20120405";
    int year = s.charAt(0) * 1000 + s.charAt(1) * 100 + s.charAt(2) * 10 + s.charAt(3) - yearZero;
    int month = s.charAt(4) * 10 + s.charAt(5) - monthAndDayZero;
    int day = s.charAt(6) * 10 + s.charAt(7) - monthAndDayZero;
}

Doing a quick and dirty benchmark with 100,000 iterations warm up and 10,000,000 timed iterations, I get:

  • 700ms for your first method
  • 350ms for your second method
  • 10ms with my method.
amphetamachine
  • 27,620
  • 12
  • 60
  • 72
assylias
  • 321,522
  • 82
  • 660
  • 783
  • @nim not sure what you mean - year is 2012 after the calculation. – assylias Apr 04 '12 at 15:20
  • Ignore my comment, I didn't see the adjustment `yearZero` etc.. – Nim Apr 04 '12 at 15:37
  • 1
    In almost all normal situations I would prefer the modulo solution posted by the OP, even if this is faster. Why? Because you grasp what is happening in a few seconds when seeing that code. Your code is a little bit more clever, but therefore also takes more time to understand, which is a disadvantage. And I doubt there are many situations where date conversion is the performance bottleneck. – Alderath Apr 04 '12 at 16:17
  • 3
    @Alderath Completely agree - I would never include what I posted in my code! But it does answer the question! – assylias Apr 04 '12 at 16:38
0

I believe the mod method will be faster. By calling the function your creating variable and location instances on the stack and create a heavier solution.

Mod is standard math operator and is likely very optomized.

But as Hunter McMillen said "You should look at the Calendar class API"

RyanS
  • 3,964
  • 3
  • 23
  • 37