14

I have this situation where I am reading about 130K records containing dates stored as String fields. Some records contain blanks (nulls), some contain strings like this: 'dd-MMM-yy' and some contain this 'dd/MM/yyyy'.

I have written a method like this:

public Date parsedate(String date){

   if(date !== null){
      try{
        1. create a SimpleDateFormat object using 'dd-MMM-yy' as the pattern
        2. parse the date
        3. return the parsed date
      }catch(ParseException e){
          try{
              1. create a SimpleDateFormat object using 'dd/MM/yyy' as the pattern
              2. parse the date
              3. return parsed date
           }catch(ParseException e){
              return null
           }
      }
   }else{
      return null
   }

} 

So you may have already spotted the problem. I am using the try .. catch as part of my logic. It would be better is I can determine before hand that the String actually contains a parseable date in some format then attempt to parse it.

So, is there some API or library that can help with this? I do not mind writing several different Parse classes to handle the different formats and then creating a factory to select the correct6 one, but, how do I determine which one?

Thanks.

Morgul Master
  • 238
  • 2
  • 3
  • 8
  • 3
    If you decide to keep your solution, then please create only 2 instances of SimpleDateFormat and store them as constants in your class rather then creating them 130K times. – van Jun 07 '09 at 22:48
  • 2
    If you do store them as constants, make absolutely sure they are not used from multiple threads at once! I ran into problems with that earlier and contributed a FindBugs detector, that finds static DateFormats and Calendars. They are documented as non-thread-safe, but that's easy to miss. See http://dschneller.blogspot.com/2007/04/calendar-dateformat-and-multi-threading.html , http://dschneller.blogspot.com/2007/04/findbugs-writing-custom-detectors-part.html and http://dschneller.blogspot.com/2007/05/findbugs-writing-custom-detectors-part.html – Daniel Schneller Jun 07 '09 at 23:42
  • 6
    @van: Don't do that. SimpleDateFormat is not thread-safe, so if you use the class from more than one thread, things will blow up in your face. – Apocalisp Jun 07 '09 at 23:43
  • I think that I will take everyones advice and continue using the try ..catch for now since this is really a one off app so I will not be running it in a production environment. But I will make the Functional Java the long term solution. It feels clean to me. Thanks. – Morgul Master Jun 08 '09 at 01:38
  • 1
    After having gone through a long debugging process because of the thread safety problem, I would suggest using JODA. It is completely thread safe since all formatters adn date times are immutable. – user44242 Jun 08 '09 at 07:54

12 Answers12

8

See Lazy Error Handling in Java for an overview of how to eliminate try/catch blocks using an Option type.

Functional Java is your friend.

In essence, what you want to do is to wrap the date parsing in a function that doesn't throw anything, but indicates in its return type whether parsing was successful or not. For example:

import fj.F; import fj.F2;
import fj.data.Option;
import java.text.SimpleDateFormat;
import java.text.ParseException;
import static fj.Function.curry;
import static fj.Option.some;
import static fj.Option.none;
...

F<String, F<String, Option<Date>>> parseDate =
  curry(new F2<String, String, Option<Date>>() {
    public Option<Date> f(String pattern, String s) {
      try {
        return some(new SimpleDateFormat(pattern).parse(s));
      }
      catch (ParseException e) {
        return none();
      }
    }
  });

OK, now you've a reusable date parser that doesn't throw anything, but indicates failure by returning a value of type Option.None. Here's how you use it:

import fj.data.List;
import static fj.data.Stream.stream;
import static fj.data.Option.isSome_;
....
public Option<Date> parseWithPatterns(String s, Stream<String> patterns) { 
  return stream(s).apply(patterns.map(parseDate)).find(isSome_()); 
}

That will give you the date parsed with the first pattern that matches, or a value of type Option.None, which is type-safe whereas null isn't.

If you're wondering what Stream is... it's a lazy list. This ensures that you ignore patterns after the first successful one. No need to do too much work.

Call your function like this:

for (Date d: parseWithPatterns(someString, stream("dd/MM/yyyy", "dd-MM-yyyy")) {
  // Do something with the date here.
}

Or...

Option<Date> d = parseWithPatterns(someString,
                                   stream("dd/MM/yyyy", "dd-MM-yyyy"));
if (d.isNone()) {
  // Handle the case where neither pattern matches.
} 
else {
  // Do something with d.some()
}
Apocalisp
  • 34,834
  • 8
  • 106
  • 155
7

Don't be too hard on yourself about using try-catch in logic: this is one of those situations where Java forces you to so there's not a lot you can do about it.

But in this case you could instead use DateFormat.parse(String, ParsePosition).

cletus
  • 616,129
  • 168
  • 910
  • 942
  • in java8 even though the docs (http://docs.oracle.com/javase/8/docs/api/java/text/DateFormat.html#parse-java.lang.String-java.text.ParsePosition-) doesn't say it throws anything, the code indicates that it does throw a few RuntimeException :( – Meow Jan 05 '17 at 20:17
6

You can take advantage of regular expressions to determine which format the string is in, and whether it matches any valid format. Something like this (not tested):

(Oops, I wrote this in C# before checking to see what language you were using.)

Regex test = new Regex(@"^(?:(?<formatA>\d{2}-[a-zA-Z]{3}-\d{2})|(?<formatB>\d{2}/\d{2}/\d{3}))$", RegexOption.Compiled);
Match match = test.Match(yourString);
if (match.Success)
{
    if (!string.IsNullOrEmpty(match.Groups["formatA"]))
    {
        // Use format A.
    }
    else if (!string.IsNullOrEmpty(match.Groups["formatB"]))
    {
        // Use format B.
    }
    ...
}
John Fisher
  • 22,355
  • 2
  • 39
  • 64
3

Looks like three options if you only have two, known formats:

  • check for the presence of - or / first and start with that parsing for that format.
  • check the length since "dd-MMM-yy" and "dd/MM/yyyy" are different
  • use precompiled regular expressions

The latter seems unnecessary.

Colin Burnett
  • 11,150
  • 6
  • 31
  • 40
3

If you formats are exact (June 7th 1999 would be either 07-Jun-99 or 07/06/1999: you are sure that you have leading zeros), then you could just check for the length of the string before trying to parse.

Be careful with the short month name in the first version, because Jun may not be June in another language.

But if your data is coming from one database, then I would just convert all dates to the common format (it is one-off, but then you control the data and its format).

van
  • 74,297
  • 13
  • 168
  • 171
3

In this limited situation, the best (and fastest method) is certinally to parse out the day, then based on the next char either '/' or '-' try to parse out the rest. and if at any point there is unexpected data, return NULL then.

Arelius
  • 1,216
  • 8
  • 15
3

Assuming the patterns you gave are the only likely choices, I would look at the String passed in to see which format to apply.

public Date parseDate(final String date) {
  if (date == null) {
    return null;
  }

  SimpleDateFormat format = (date.charAt(2) == '/') ? new SimpleDateFormat("dd/MMM/yyyy")
                                                   : new SimpleDateFormat("dd-MMM-yy");
  try {
    return format.parse(date);
  } catch (ParseException e) {
    // Log a complaint and include date in the complaint
  }
  return null;
}

As others have mentioned, if you can guarantee that you will never access the DateFormats in a multi-threaded manner, you can make class-level or static instances.

Eddie
  • 53,828
  • 22
  • 125
  • 145
2

A simple utility class I have written for my project. Hope this helps someone.

Usage examples:

DateUtils.multiParse("1-12-12");
DateUtils.multiParse("2-24-2012");
DateUtils.multiParse("3/5/2012");
DateUtils.multiParse("2/16/12");




public class DateUtils {

    private static List<SimpleDateFormat> dateFormats = new ArrayList<SimpleDateFormat>();



    private Utils() {
        dateFormats.add(new SimpleDateFormat("MM/dd/yy")); // must precede yyyy
        dateFormats.add(new SimpleDateFormat("MM/dd/yyyy"));
        dateFormats.add(new SimpleDateFormat("MM-dd-yy"));
        dateFormats.add(new SimpleDateFormat("MM-dd-yyyy"));            

    }
        private static Date tryToParse(String input, SimpleDateFormat format) {
        Date date  = null;
        try {
            date = format.parse(input);
        } catch (ParseException e) {

        }

        return date;
    }

        public static Date multiParse(String input)  {
        Date date = null;
        for (SimpleDateFormat format : dateFormats) {
            date = tryToParse(input, format);
            if (date != null) break;
        }
        return date;
    }
}
user979051
  • 1,257
  • 2
  • 19
  • 35
2

Use regular expressions to parse your string. Make sure that you keep both regex's pre-compiled (not create new on every method call, but store them as constants), and compare if it actually is faster then the try-catch you use.

I still find it strange that your method returns null if both versions fail rather then throwing an exception.

van
  • 74,297
  • 13
  • 168
  • 171
2

you could use split to determine which format to use

String[] parts = date.split("-");
df = (parts.length==3 ? format1 : format2);

That assumes they are all in one or the other format, you could improve the checking if need be

objects
  • 8,637
  • 4
  • 30
  • 38
2

An alternative to creating a SimpleDateFormat (or two) per iteration would be to lazily populate a ThreadLocal container for these formats. This will solve both Thread safety concerns and concerns around object creation performance.

akf
  • 38,619
  • 8
  • 86
  • 96
0

On one hand I see nothing wrong with your use of try/catch for the purpose, it’s the option I would use. On the other hand there are alternatives:

  1. Take a taste from the string before deciding how to parse it.
  2. Use optional parts of the format pattern string.

For my demonstrations I am using java.time, the modern Java date and time API, because the Date class used in the question was always poorly designed and is now long outdated. For a date without time of day we need a java.time.LocalDate.

try-catch

Using try-catch with java.time looks like this:

    DateTimeFormatter ddmmmuuFormatter = DateTimeFormatter.ofPattern("dd-MMM-uu", Locale.ENGLISH);
    DateTimeFormatter ddmmuuuuFormatter = DateTimeFormatter.ofPattern("dd/MM/uuuu");

    String dateString = "07-Jun-09";

    LocalDate result;
    try {
        result = LocalDate.parse(dateString, ddmmmuuFormatter);
    } catch (DateTimeParseException dtpe) {
        result = LocalDate.parse(dateString, ddmmuuuuFormatter);
    }
    System.out.println("Date: " + result);

Output is:

Date: 2009-06-07

Suppose instead we defined the string as:

    String dateString = "07/06/2009";

Then output is still the same.

Take a taste

If you prefer to avoid the try-catch construct, it’s easy to make a simple check to decide which of the formats your string conforms to. For example:

    if (dateString.contains("-")) {
        result = LocalDate.parse(dateString, ddmmmuuFormatter);
    } else {
        result = LocalDate.parse(dateString, ddmmuuuuFormatter);
    }

The result is the same as before.

Use optional parts in the format pattern string

This is the option I like the least, but it’s short and presented for some measure of completeness.

    DateTimeFormatter dateFormatter
            = DateTimeFormatter.ofPattern("[dd-MMM-uu][dd/MM/uuuu]", Locale.ENGLISH);
    LocalDate result = LocalDate.parse(dateString, dateFormatter);

The square brackets denote optional parts of the format. So Java first tries to parse using dd-MMM-uu. No matter if successful or not it then tries to parse the remainder of the string using dd/MM/uuuu. Given your two formats one of the attempts will succeed, and you have parsed the date. The result is still the same as above.

Link

Oracle tutorial: Date Time explaining how to use java.time.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161