12

I am reading a text file containing dates, and I want to parse the Strings representing the dates into Date objects in java. What I notice is the operation is slow. Why? is there any way to accelerate it? My file looks like:

2012-05-02 12:08:06:950, secondColumn, thirdColumn
2012-05-02 12:08:07:530, secondColumn, thirdColumn
2012-05-02 12:08:08:610, secondColumn, thirdColumn

I am reading the file line by line, then I am getting the date String from each line, then I am parsing it into a Date object using a SimpleDateFormat as follow:

DataInputStream in = new DataInputStream(myFileInputStream);
BufferedReader  br = new BufferedReader(new InputStreamReader(in));
String strLine;

SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
while ((strLine = br.readLine()) != null)
{
    ....Do things....
    Date myDateTime = (Date)formatter.parse(myDateString);
    ...Do things....
}
Rami
  • 8,044
  • 18
  • 66
  • 108
  • 3
    did you try using the same SimpleDateFormat instance throughout the entire file parse operation? – jtahlborn Aug 03 '12 at 14:19
  • 12
    how have you determined that it is slow? – Michael Easter Aug 03 '12 at 14:19
  • @Micheal, I just comment the operations related to the parse, the reading loop (line by line) is much quicker then. – Rami Aug 03 '12 at 14:21
  • @jtqhlborn yes the SimpleDateFormat is outside the reading loop, it is common for all the file. – Rami Aug 03 '12 at 14:22
  • 1
    The posted code is not enough to tell how you are handing the situation. How many lines do you have in your file, and how long is it taking? – Bhesh Gurung Aug 03 '12 at 14:22
  • If you are creating a new SimpleDateFormat instance in a loop everytime, your code will be slow. Creating SimpleDateFormat is expensive, try to define it outside the loop and resuse it. [A nice article on SimpleDateFormat performance.][1] [1]: http://www.thedwick.com/2008/04/simpledateformat-performance-pig/ – gresdiplitude Aug 03 '12 at 14:25
  • Define slow. How slow is slow? – Rosdi Kasim Aug 03 '12 at 14:29
  • @BheshGurung I just edited my code... my files contain about 3000 lines each. – Rami Aug 03 '12 at 14:29
  • 2
    Take a look at the code of SimpleDateFormat::parse(String) to see it's not an easy task. Especially the error handling is quite a bit of stuff. If your dates always look the same, you could parse them from the line yourself and fill the date instance accordingly. If that is faster I wouldn't dare to answer beforehand though. – jayeff Aug 03 '12 at 14:30
  • @gresdiplitude I am actually defining my SimpleDateFormat outside the loop, I just edited my code. – Rami Aug 03 '12 at 14:30
  • @jayeff yes maybe is my only solution the... Thank you for this proposition – Rami Aug 03 '12 at 14:32
  • Have you measured just the parsing of the date? Or is it possible that the "Do things" parts are the real bottleneck? – jayeff Aug 03 '12 at 14:33
  • @jayeff yes, 'Do things' is just dummy operations like incrementing an integer counter. I completely removed do Things. Without the date parsing reading the file line by line is just a matter of seconds, but with the parsing operation, it takes several minutes for one file. – Rami Aug 03 '12 at 14:36
  • 1
    If you have control over the creation of the file you want to read in, you could this: Add the date as a long when creating the file, read the long instead of parsing the above string and use the `Date(long date)` constructor. – jayeff Aug 03 '12 at 14:41
  • 2
    I really wish people would stop mixing DataInputStream with BufferedReader. Whoever started this meme ..... grrr. – Peter Lawrey Aug 03 '12 at 15:04
  • FYI, the troublesome old date-time classes such as [`java.util.Date`](https://docs.oracle.com/javase/9/docs/api/java/util/Date.html), [`java.util.Calendar`](https://docs.oracle.com/javase/9/docs/api/java/util/Calendar.html), and `java.text.SimpleDateFormat` are now [legacy](https://en.wikipedia.org/wiki/Legacy_system), supplanted by the [*java.time*](https://docs.oracle.com/javase/9/docs/api/java/time/package-summary.html) classes built into Java 8 & Java 9. See [*Tutorial* by Oracle](https://docs.oracle.com/javase/tutorial/datetime/TOC.html). – Basil Bourque Mar 04 '18 at 23:44

3 Answers3

8

The converting of dates and timezone is expensive. If you can assume your date/times are similar to each other, you can convert the date and hours/minutes (or only dates if you use GMT) whenever minutes change and generate the seconds yourself.

This will call parse once per minute. Depending on your assumptions you could make it once per hours or once per day.

String pattern = "yyyy-MM-dd HH:mm";
SimpleDateFormat formatter = new SimpleDateFormat(pattern);
String lastTime = "";
long lastDate = 0;
while ((strLine = br.readLine()) != null) {
    String myDateString = strLine.split(", ")[0];
    if (!myDateString.startsWith(lastTime)) {
        lastTime = myDateString.substring(0, pattern.length());
        lastDate = formatter.parse(lastTime).getTime();
    }
    Date date = new Date(lastDate + Integer.parseInt(myDateString.substring(pattern.length() + 1).replace(":", "")));
}
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
4

tl;dr

  • Use java.time rather than legacy classes.
  • Each parse of String to LocalDateTime with DateTimeFormatter takes less than 1,500 nanoseconds each (0.0000015 seconds).

java.time

You are using troublesome old date-time classes that are now legacy, supplanted by the java.time classes.

Let's do a bit of micro-benchmarking to see just how slow/fast is parsing a date-time string in java.time.

ISO 8601

The ISO 8601 standard defines sensible practical formats for textually representing date-time values. The java.time classes use these standard formats by default when parsing/generating strings.

Use these standard formats instead of inventing your own, as seen in the Question.

DateTimeFormatter

Define a formatting pattern to match your inputs.

DateTimeFormatter f = DateTimeFormatter.ofPattern( "uuuu-MM-dd HH:mm:ss:SSS" );

We will parse each such input as a LocalDateTime because your input lacks an indicator of time zone or offset-from-UTC. Keep in mind that such values do not represent a moment, are not a point on the timeline. To be an actual moment requires the context of a zone/offset.

String inputInitial = "2012-05-02 12:08:06:950" ;
LocalDateTime ldtInitial = LocalDateTime.parse( inputInitial , f );

Let's make a bunch of such inputs.

int count = 1_000_000;
List < String > inputs = new ArrayList <>( count );

for ( int i = 0 ; i < count ; i++ )
{
    String s = ldtInitial.plusSeconds( i ).format( f );
    inputs.add( s );
}

Test harness.

long start = System.nanoTime();
for ( String input : inputs )
{
    LocalDateTime ldt = LocalDateTime.parse( input , f );
}
long stop = System.nanoTime();
long elapsed = ( stop - start );
long nanosPerParse = (elapsed / count ) ;
Duration d = Duration.ofNanos( elapsed );

Dump to console.

System.out.println( "Parsing " + count + " strings to LocalDateTime took: " + d  + ". About " + nanosPerParse + " nanos each.");

Parsing 1000000 strings to LocalDateTime took: PT1.320778647S. About 1320 nanos each.

Too slow?

So it takes about a second and a half to parse a million such inputs, on a MacBook Pro laptop with quad-core Intel i7 CPU. In my test runs, each parse takes about 1,000 to 1,500 nanoseconds each.

To my mind, that is not a performance problem.


About java.time

The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date, Calendar, & SimpleDateFormat.

The Joda-Time project, now in maintenance mode, advises migration to the java.time classes.

To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.

You may exchange java.time objects directly with your database. Use a JDBC driver compliant with JDBC 4.2 or later. No need for strings, no need for java.sql.* classes.

Where to obtain the java.time classes?

The ThreeTen-Extra project extends java.time with additional classes. This project is a proving ground for possible future additions to java.time. You may find some useful classes here such as Interval, YearWeek, YearQuarter, and more.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • 1
    By far the best answer!! Great!! – adoalonso May 20 '20 at 08:52
  • Interestingly, "LocalDateTime.parse(..).toEpochSecond(ZoneOffset.UTC)" is actually slower than "SimpleDateFormat#parse(..).getTime()" using identical pattern ("yyyy-MM-dd'T'HH:mm:ss"). By about 10-15% for me. I didn't expect that. – Nikita Rybak Jan 07 '21 at 02:44
2

I would suggest writing a custom parser, which is going to be faster. Something like:

Date parseYYYYMMDDHHMM(String strDate) {
   String yearString = strDate.substring(0, 4);
   int year = Integer.parseInt(yearString);
   ...

Another way is using pre-computed hashmap of datetime (w/o millis) to unix-timestamp. Will work if there are no much distinct dates (or you can recompute it once the date flips over).

Denis Kulagin
  • 8,472
  • 17
  • 60
  • 129