32

I am supporting a common library at work that performs many checks of a given string to see if it is a valid date. The Java API, commons-lang library, and JodaTime all have methods which can parse a string and turn it in to a date to let you know if it is actually a valid date or not, but I was hoping that there would be a way of doing the validation without actually creating a date object (or DateTime as is the case with the JodaTime library). For example here is a simple piece of example code:

public boolean isValidDate(String dateString) {
    SimpleDateFormat df = new SimpleDateFormat("yyyyMMdd");
    try {
        df.parse(dateString);
        return true;
    } catch (ParseException e) {
        return false;
    }
}

This just seems wasteful to me, we are throwing away the resulting object. From my benchmarks about 5% of our time in this common library is spent validating dates. I'm hoping I'm just missing an obvious API. Any suggestions would be great!

UPDATE

Assume that we can always use the same date format at all times (likely yyyyMMdd). I did think about using a regex as well, but then it would need to be aware of the number of days in each month, leap years, etc...


Results

Parsed a date 10 million times

Using Java's SimpleDateFormat: ~32 seconds 
Using commons-lang DateUtils.parseDate: ~32 seconds
Using JodaTime's DateTimeFormatter: ~3.5 seconds 
Using the pure code/math solution by Slanec: ~0.8 seconds 
Using precomputed results by Slanec and dfb (minus filling cache): ~0.2 seconds

There were some very creative answers, I appreciate it! I guess now I just need to decide how much flexibility I need what I want the code to look like. I'm going to say that dfb's answer is correct because it was purely the fastest which was my original questions. Thanks!

jjathman
  • 12,536
  • 8
  • 29
  • 33
  • Throwing Exceptions tends to be "heavy". Have you considered using regular expressions for validation ? – James P. Jul 14 '12 at 02:34
  • 3
    Well, one thing you could do, if you end up having many of the same strings to validate, is use some kind of memoization technique. – JRL Jul 14 '12 at 02:34
  • 2
    European date format (DD/MM/YYYY) or US date format (MM/DD/YYYY)? Good luck. – James Jul 14 '12 at 02:36
  • I guess you're right. But to validate the string you would need a regex. Then this regex must be aware of the format of your date. If you use more than one format of date in the system, you will have 2 methods to validate dates or pass the regex for every method along with the string date. Anyway, it would me a mess. – Tiago Farias Jul 14 '12 at 02:36
  • Added an update for some of these questions. – jjathman Jul 14 '12 at 02:42
  • what? 32 seconds? are you serious? – Marcio Granzotto Dec 17 '15 at 18:40
  • @MarcioGranzotto yep. SimpleDateFormat is really slow, hence the question. Lots of object creation and as a commenter mentioned, exception handling logic is much slower than normal code. – jjathman Dec 18 '15 at 18:22

8 Answers8

16

If you're really concerned about performance and your date format is really that simple, just pre-compute all the valid strings and hash them in memory. The format you have above only has ~ 8 million valid combinations up to 2050


EDIT by Slanec - reference implementation

This implementation depends on your specific dateformat. It could be adapted to any specific dateformat out there (just like my first answer, but a bit better).

It makes a set of all dates from 1900 to 2050 (stored as Strings - there are 54787 of them) and then compares the given dates with those stored.

Once the dates set is created, it's fast as hell. A quick microbenchmark showed an improvement by a factor of 10 over my first solution.

private static Set<String> dates = new HashSet<String>();
static {
    for (int year = 1900; year < 2050; year++) {
        for (int month = 1; month <= 12; month++) {
            for (int day = 1; day <= daysInMonth(year, month); day++) {
                StringBuilder date = new StringBuilder();
                date.append(String.format("%04d", year));
                date.append(String.format("%02d", month));
                date.append(String.format("%02d", day));
                dates.add(date.toString());
            }
        }
    }
}

public static boolean isValidDate2(String dateString) {
    return dates.contains(dateString);
}

P.S. It can be modified to use Set<Integer> or even Trove's TIntHashSet which reduces memory usage a lot (and therefore allows to use a much larger timespan), the performance then drops to a level just below my original solution.

Community
  • 1
  • 1
dfb
  • 13,133
  • 2
  • 31
  • 52
  • That's an interesting idea. Would you store them in a HashSet after you've computed them and then check if the set contains the value? I'm not sure the most efficient way to do this. – jjathman Jul 14 '12 at 02:47
  • 1
    Yup, that's the idea. That'll be more efficient than most other approaches, but it's not very flexible if your date format changes. If you get to more complex date formats, doing this will be impossible due to combinatorial explosion. You would generate the combinations at the start of the application and then validations would be extremely fast. – dfb Jul 14 '12 at 03:23
  • 1
    @jjathman I added a reference implementation. The memory usage is _not_ that bad! – Petr Janeček Jul 14 '12 at 05:26
  • What if date has a timestamps too? Ex: "20101001 020202" i.e "yyyyMMdd HHmmss" – chandra mohan Jan 28 '15 at 08:23
13

You can revert your thinking - try to fail as quickly as possible when the String definitely is no date:

If none of those apply, then try to parse it - preferably with a pre-made static Format object, don't create one on every method run.


EDIT after comments

Based on this neat trick, I wrote a fast validation method. It looks ugly, but is significantly faster than the usual library methods (which should be used in any standard situation!), because it relies on your specific date format and does not create a Date object. It handles the date as an int and goes on from that.

I tested the daysInMonth() method just a little bit (the leap year condition taken from Peter Lawrey), so I hope there's no apparent bug.

A quick (estimated!) microbenchmark indicated a speedup by a factor of 30.

public static boolean isValidDate(String dateString) {
    if (dateString == null || dateString.length() != "yyyyMMdd".length()) {
        return false;
    }

    int date;
    try {
        date = Integer.parseInt(dateString);
    } catch (NumberFormatException e) {
        return false;
    }

    int year = date / 10000;
    int month = (date % 10000) / 100;
    int day = date % 100;

    // leap years calculation not valid before 1581
    boolean yearOk = (year >= 1581) && (year <= 2500);
    boolean monthOk = (month >= 1) && (month <= 12);
    boolean dayOk = (day >= 1) && (day <= daysInMonth(year, month));

    return (yearOk && monthOk && dayOk);
}

private static int daysInMonth(int year, int month) {
    int daysInMonth;
    switch (month) {
        case 1: // fall through
        case 3: // fall through
        case 5: // fall through
        case 7: // fall through
        case 8: // fall through
        case 10: // fall through
        case 12:
            daysInMonth = 31;
            break;
        case 2:
            if (((year % 4 == 0) && (year % 100 != 0)) || (year % 400 == 0)) {
                daysInMonth = 29;
            } else {
                daysInMonth = 28;
            }
            break;
        default:
            // returns 30 even for nonexistant months 
            daysInMonth = 30;
    }
    return daysInMonth;
}

P.S. Your example method above will return true for "99999999". Mine will only return true for existent dates :).

Community
  • 1
  • 1
Petr Janeček
  • 37,768
  • 12
  • 121
  • 145
  • Our code already does something similar. The problem with this thinking is that almost all of our dates will be valid...we have to have the stupid check for the 1 in a million case – jjathman Jul 14 '12 at 02:46
  • 1
    @jjathman Well. Then you just need to parse it fast. In your example case (with no separators etc.), you [could use this neat trick](http://stackoverflow.com/a/10014172/1273080) and catch Exceptions out of it. It would be the same as your current approach, just faster (also, you wouldn't have to create the `Date`, just assert the right values in the three integers). – Petr Janeček Jul 14 '12 at 02:53
  • I did look at that post before I wrote up my question, that method doesn't take the number of days in a month in to account right? Maybe I'm missing something but wouldn't that code "parse" 20129999 just fine? – jjathman Jul 14 '12 at 02:55
  • 1
    Edited my answer with the response. – Petr Janeček Jul 14 '12 at 03:13
  • 1
    Very cool. I'll try benchmarking this and see how it performs against the other solutions. Thank you! – jjathman Jul 14 '12 at 03:14
  • @jjathman I added the method for days in month. There are no comments, though, so please, comment it in your actual production code ;). – Petr Janeček Jul 14 '12 at 03:38
6

I think that the better way to know if a certain date is valid is defining a method like:

public static boolean isValidDate(String input, String format) {
    boolean valid = false;

    try {
        SimpleDateFormat dateFormat = new SimpleDateFormat(format);
        String output = dateFormat.parse(input).format(format);
        valid = input.equals(output); 
    } catch (Exception ignore) {}

    return valid;
}

On one hand the method checks the date has the correct format , and on the other hand checks the date corresponds to a valid date . For example, the date "2015/02/29" will be parsed to "2015/03/01", so the input and output will be different, and the method will return false.

victor.hernandez
  • 2,462
  • 2
  • 27
  • 32
2

This is my way to check if the date is in correct format and is actually a valid date. Presume we do not need SimpleDateFormat to convert incorrect to date to a correct one but instead a method just returns false. Output to console is used only to check how the method works on each step.

public class DateFormat {

public static boolean validateDateFormat(String stringToValidate){
    String sdf = "yyyy-MM-dd HH:mm:ss";
    SimpleDateFormat format=new SimpleDateFormat(sdf);   
    String dateFormat = "[12]{1,1}[0-9]{3,3}-(([0]{0,1}[1-9]{1,1})|([1]{0,1}[0-2]{1,1}))-(([0-2]{0,1}[1-9]{1,1})|([3]{0,1}[01]{1,1}))[ ](([01]{0,1}[0-9]{1,1})|([2]{0,1}[0-3]{1,1}))((([:][0-5]{0,1}[0-9]{0,1})|([:][0-5]{0,1}[0-9]{0,1}))){0,2}";
    boolean isPassed = false;

    isPassed = (stringToValidate.matches(dateFormat)) ? true : false;


    if (isPassed){
        // digits are correct. Now, check that the date itself is correct
        // correct the date format to the full date format
        String correctDate = correctDateFormat(stringToValidate);
        try
        {
            Date d = format.parse(correctDate);
            isPassed = (correctDate.equals(new SimpleDateFormat(sdf).format(d))) ? true : false;
            System.out.println("In = " + correctDate + "; Out = " 
                    + new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(d) + " equals = " 
                    + (correctDate.equals(new SimpleDateFormat(sdf).format(d))));
            // check that are date is less than current
            if (!isPassed || d.after(new Date())) {
                System.out.println(new SimpleDateFormat(sdf).format(d) + " is after current day " 
                        + new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date()));
                isPassed = false;
            } else {
                isPassed = true;
            }
        } catch (ParseException e) {
            System.out.println(correctDate + " Exception! " + e.getMessage());
            isPassed = false;
        }
    } else {
        return false;
    }
    return isPassed;
}

/**
 *  method to fill up the values that are not full, like 2 hours -> 02 hours
 *  to avoid undesirable difference when we will compare original date with parsed date with SimpleDateFormat
 */
private static String correctDateFormat(String stringToValidate) {
    String correctDate = "";
    StringTokenizer stringTokens = new StringTokenizer(stringToValidate, "-" + " " + ":", false);
    List<String> tokens = new ArrayList<>();
    System.out.println("Inside of recognizer");
    while (stringTokens.hasMoreTokens()) {
        String token = stringTokens.nextToken();
        tokens.add(token);
        // for debug
        System.out.print(token + "|");
    }
    for (int i=0; i<tokens.size(); i++){
        if (tokens.get(i).length() % 2 != 0){
            String element = tokens.get(i);
            element = "0" + element;
            tokens.set(i, element);
        }
    }
    // build a correct final string
    // 6 elements in the date: yyyy-MM-dd hh:mm:ss
    // come through and add mandatory 2 elements
    for (int i=0; i<2; i++){
        correctDate = correctDate + tokens.get(i) + "-";
    }
    // add mandatory 3rd (dd) and 4th elements (hh)
    correctDate = correctDate + tokens.get(2) + " " + tokens.get(3);
    if (tokens.size() == 4){
        correctDate = correctDate + ":00:00";
    } else if (tokens.size() == 5){
        correctDate = correctDate + ":" + tokens.get(4) + ":00";
    } else if (tokens.size() == 6){
        correctDate = correctDate + ":" + tokens.get(4) + ":" + tokens.get(5);
    }  

    System.out.println("The full correct date format is " + correctDate);
    return correctDate;
}

}

A JUnit test for that:

import static org.junit.Assert.*;
import junitparams.JUnitParamsRunner;
import junitparams.Parameters;
import org.junit.Test;
import org.junit.runner.RunWith;

@RunWith(JUnitParamsRunner.class)
public class DateFormatTest {

    @Parameters
    private static final Object[] getCorrectDate() {
        return new Object[] {
                new Object[]{"2014-12-13 12:12:12"},
                new Object[]{"2014-12-13 12:12:1"},
                new Object[]{"2014-12-13 12:12:01"},
                new Object[]{"2014-12-13 12:1"},
                new Object[]{"2014-12-13 12:01"},
                new Object[]{"2014-12-13 12"},
                new Object[]{"2014-12-13 1"},
                new Object[]{"2014-12-31 12:12:01"},
                new Object[]{"2014-12-30 23:59:59"},
        };
    }
    @Parameters
    private static final Object[] getWrongDate() {
        return new Object[] {
                new Object[]{"201-12-13 12:12:12"},
                new Object[]{"2014-12- 12:12:12"},
                new Object[]{"2014- 12:12:12"},
                new Object[]{"3014-12-12 12:12:12"},
                new Object[]{"2014-22-12 12:12:12"},
                new Object[]{"2014-12-42 12:12:12"},
                new Object[]{"2014-12-32 12:12:12"},
                new Object[]{"2014-13-31 12:12:12"},
                new Object[]{"2014-12-31 32:12:12"},
                new Object[]{"2014-12-31 24:12:12"},
                new Object[]{"2014-12-31 23:60:12"},
                new Object[]{"2014-12-31 23:59:60"},
                new Object[]{"2014-12-31 23:59:50."},
                new Object[]{"2014-12-31 "},
                new Object[]{"2014-12 23:59:50"},
                new Object[]{"2014 23:59:50"}
        };
    }

    @Test
    @Parameters(method="getCorrectDate")
    public void testMethodHasReturnTrueForCorrectDate(String dateToValidate) {
        assertTrue(DateFormat.validateDateFormatSimple(dateToValidate));
    }

    @Test
    @Parameters(method="getWrongDate")
    public void testMethodHasReturnFalseForWrongDate(String dateToValidate) {
        assertFalse(DateFormat.validateDateFormat(dateToValidate));
    }

}
dvdgsng
  • 1,691
  • 16
  • 27
felicity
  • 21
  • 2
1
 public static int checkIfDateIsExists(String d, String m, String y) {
        Integer[] array30 = new Integer[]{4, 6, 9, 11};
        Integer[] array31 = new Integer[]{1, 3, 5, 7, 8, 10, 12};

        int i = 0;
        int day = Integer.parseInt(d);
        int month = Integer.parseInt(m);
        int year = Integer.parseInt(y);

        if (month == 2) {
            if (isLeapYear(year)) {
                if (day > 29) {
                    i = 2; // false
                } else {
                    i = 1; // true
                }
            } else {
                if (day > 28) {
                    i = 2;// false
                } else {
                    i = 1;// true
                }
            }
        } else if (month == 4 || month == 6 || month == 9 || month == 11) {
            if (day > 30) {
                i = 2;// false
            } else {
                i = 1;// true
            }
        } else {
            i = 1;// true
        }

        return i;
    }

if it returns i = 2 means date is invalid and returns 1 if date is valid

1

If following line throws exception then it is invalid date else this will return valid date. Please make sure you use appropriate DateTimeFormatter in the following statement.

LocalDate.parse(uncheckedStringDate, DateTimeFormatter.BASIC_ISO_DATE)

Rakesh
  • 4,004
  • 2
  • 19
  • 31
0

Building upon the answer by dfb, you could do a two step hash.

  1. Create a simple object (day,month,year) representing a date. Compute every calendar day for the next 50 years, which should be less than 20k different dates.
  2. Make a regex that confirms if your input string matches yyyyMMdd, but does not check if the value is a valid day (e.g. 99999999 will pass)
  3. The check function will first do a regex, and if that succeeds -- pass it to the hash function check. Assuming your date object is a 8bit + 8bit + 8bit (for year after 1900), then 24 bits * 20k, then the whole hash table should be pretty small... certainly under 500Kb, and very quick to load from disk if serialized and compressed.
Arcymag
  • 1,037
  • 1
  • 8
  • 18
0

One could use a combination of regex and manual leap year checking. Thus:

if (matches ^\d\d\d\d((01|03|05|07|08|10|12)(30|31|[012]\d)|(04|06|09|11)(30|[012]\d)|02[012]\d)$)
    if (endsWith "0229")
         return true or false depending on the year being a leap year
    return true
return false
Ingo
  • 36,037
  • 5
  • 53
  • 100