-1

I have researched fairly extensively (I've looked at 5 or 6 stackoverflow posts that have often lead me to stare at the official Java regex resource for 30 minutes at a time) but I'm still struggling with this issue. Here is my code for the String matching method:

        public String checkDateFormat(String date)
        {
            if (date.matches("[0-3][0-9](?:-|/)[0-1][0-9](?:-|/)[0-2][0-9][0-9][0-9] [0-2][0-9]:[0-6][0-9]:[0-6][0-9]"))
            {
                return date;
            }
            else if (date.matches("[0-1][0-9](?:-|/)[0-3][0-9](?:-|/)[0-2][0-9][0-9][0-9] [0-2][0-9]:[0-6][0-9]:[0-6][0-9]"))
            {
                return date;
            }
            else if (date.matches("[0-2][0-9][0-9][0-9](?:-|/)[0-1][0-9](?:-|/)[0-3][0-9] [0-2][0-9]:[0-6][0-9]:[0-6][0-9]"))
            {
                return date;
            }
            else if (date.matches("[0-2][0-9][0-9][0-9](?:-|/)[0-3][0-9](?:-|/)[0-1][0-9] [0-2][0-9]:[0-6][0-9]:[0-6][0-9]"))
            {
                return date;
            }
            else
            {
                throw new InvalidParameterException("(The passed argument is invalid date format: [yyyy/dd/MM, yyyy/MM/dd, MM/dd/yyyy, or dd/MM/yyyy])");
            }
        }

The current setup for checking the - and / conditions is one of many I've tried; I've also attempted these: [-/] [-\\\\/] [-&&[/]] [-[/]] and probably a few more I don't remember.

The program simply needs to be able to accept - and / as different separators in the dd/MM/yyyy format as I'm reading this info from multiple csv files that contain the format both as dd/MM/yyyy and dd-MM-yyyy.

EDIT: I've updated my code to this for the time being, but I still fail to pass the data through this validation method. To be more specific about the problem, the specific string I'm checking is 2016-01-01. I've also tried using the code Ole V.V. provided using the DateTimeFormatter class, but to no success.

Joe Coon
  • 21
  • 6
  • [Why use regex for this?](https://stackoverflow.com/questions/7579227/how-to-get-the-given-date-string-formatpattern-in-java) – ctwheels Nov 13 '17 at 19:45
  • I take it that `30/11/2017` and `30-11-2017` should both be accepted, but not `30-/11-/2017`, right? How about `30-11/2017` (one hyphen, one slash)? – Ole V.V. Nov 13 '17 at 19:46
  • You're correct that I want to accept `30/11/2017` or `30-11-2017` and not the other potential inputs you mentioned. Capturing groups can help me with this? – Joe Coon Nov 14 '17 at 02:19
  • The patterns in your code seem to expect hous, minutes and seconds in the string too, for example `2017-31-12 23:32:30`. But your own examples don’t include time of day. Does your CSV file include time of day with the dates? Sometimes? Always? – Ole V.V. Nov 14 '17 at 06:38
  • Yes, @JoeCoon, capturing groups may help, see the edit to [my answer](https://stackoverflow.com/a/47272735/5772882). – Ole V.V. Nov 14 '17 at 11:41

2 Answers2

3

Answer number one, do not use regular expression for validating dates. Java has a nice API for parsing dates in all thinkable formats, so try parsing your date and see if it succeeds:

private static final String[] validFormatPatterns = {
        "dd/MM/uuuu HH:mm:ss",
        "dd-MM-uuuu HH:mm:ss",
        "MM/dd/uuuu HH:mm:ss",
        "MM-dd-uuuu HH:mm:ss",
        "uuuu/MM/dd HH:mm:ss",
        "uuuu-MM-dd HH:mm:ss",
        "uuuu/dd/MM HH:mm:ss",
        "uuuu-dd-MM HH:mm:ss"
};

private static final DateTimeFormatter[] validFormats
        = Arrays.stream(validFormatPatterns)
                .map(fp -> DateTimeFormatter.ofPattern(fp).withResolverStyle(ResolverStyle.STRICT))
                .toArray(DateTimeFormatter[]::new);

public String checkDateFormat(String date) {
    for (DateTimeFormatter formatter : validFormats) {
        try {
            LocalDateTime.parse(date, formatter);
            return date;
        } catch (DateTimeParseException dtpe) {
            // ignore, try next format
        }
    }
    throw new InvalidParameterException("(The passed argument is invalid date format:"
            + " [yyyy/dd/MM, yyyy/MM/dd, MM/dd/yyyy, or dd/MM/yyyy])");
}

The .withResolverStyle(ResolverStyle.STRICT) part makes sure that not even invalid dates like 2017-29-02 23:32:30 are accepted.

Other than that, the regex notation for matching either a hyphen or a slash is (?:-|/). For example

        if (date.matches("[0-3][0-9](?:-|/)[0-1][0-9](?:-|/)[0-2][0-9][0-9][0-9]"
             + " [0-2][0-9]:[0-6][0-9]:[0-6][0-9]"))

(-|/) would work too, it’s a capturing group, useful if you need to know afterwards whether a hyphen or a slash was matched. If you don’t need this, insert ?: to make the group non-capturing as I just did.

EDIT: Yes, you may use a capturing group to verify that the same separating character (either hyphen or slash) is used in both places:

date.matches("[0-3][0-9](-|/)[0-1][0-9]\\1[0-2][0-9][0-9][0-9]"))

(-|/) is the capturing group, and \\1 matches whatever the 1st (and only) capturing group matched. Groups are numbered from 1 since group 0 is the entire regular expression. The above matches 30/11/2017 and 30-11-2017 but not 30-11/2017.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
  • Thanks! This is very helpful, I'll try using some of these tools for my code. I tried reading about capture groups in the java resource but I didn't understand them. The comment @ctwheels made along with yours makes me think they are used with backreferencing which sounds really useful, so I'll try and familiarize myself with that aspect of regex. – Joe Coon Nov 14 '17 at 02:11
  • I've attempted using the information about capturing groups you gave me and even copy-pasting the provided code on validating date formats using the Java API classes like DateTimeFormatter, but I'm still not getting the code to accept the valid data. In my csv file, the data I receive and validate is `2016-01-01`, but it passed neither of the above validations I tried. – Joe Coon Nov 14 '17 at 03:07
  • Could this be because my format pattern strings (like your regular expressions) require hours, minutes and seconds in the string too, and there are no hours etc., in `2016-01-01`? – Ole V.V. Nov 14 '17 at 08:42
  • You're on the money! In previous files this code was based off of I converted the date from epoch to simple time, so the date data was all in one field; I didn't notice that time and date are in separate fields in this csv file. If I try again and come up with complete code, do I update my above post with my new code or leave the previous mistakes above? This is my first stackoverflow post. – Joe Coon Nov 14 '17 at 16:09
  • Best to leave the question as it stands, so it makes best sense to future readers. If you feel my answer doesn’t fully describe the solution that future readers may want, feel free to add your own answer too. – Ole V.V. Nov 14 '17 at 16:14
1

Brief

As per my comment, I would recommend you do not use regex for this as this will be extremely long, convoluted and will not be easy to debug. See how to get the given date string format pattern in Java or Ole V.V.'s answer on this post to do this without regex.

On the other hand, and just to show you how massive and unmanageable this regex would actually be if it included all necessary components, I've adapted my own regex to create a regex that works in Java and does what you want.

This regex not only matches the given date formats, but also ensures the correct days are given for each month and year. That means this regex ensures February only has 28 days unless it's a leap year (in which case it will have 29). This also means it ensures proper days/month (i.e. January has 31, March has 31, April has 30, etc.).


Code

The code below is a working regex that should work in most regex languages. This will not work in languages where backreferencing is not possible.

See regex in use here

^(?:
(?:0[1-9]|1\d|2[0-8])([\/-])(?:02)\1(?:\d+)|
(?:0[1-9]|1\d|2\d)([\/-])(?:02)\2(?:(?:\d*?(?:(?:0[48]|[13579][26]|[2468][048])|(?:(?:[02468][048]|[13579][26])00))|[48]00|[48])(?=\D|\b))|
(?:0[1-9]|1\d|2\d|30)([\/-])(?:0[469]|11)\3(?:\d+)|
(?:0[1-9]|1\d|2\d|3[01])([\/-])(?:0[13578]|1[02])\4(?:\d+)|
(?:02)([\/-])(?:0[1-9]|1\d|2[0-8])\5(?:\d+)|
(?:02)([\/-])(?:0[1-9]|1\d|2\d)\6(?:(?:\d*?(?:(?:0[48]|[13579][26]|[2468][048])|(?:(?:[02468][048]|[13579][26])00))|[48]00|[48])(?=\D|\b))|
(?:0[469]|11)([\/-])(?:0[1-9]|1\d|2\d|30)\7(?:\d+)|
(?:0[13578]|1[02])([\/-])(?:0[1-9]|1\d|2\d|3[01])\8(?:\d+)|
(?:\d+)([\/-])(?:0[1-9]|1\d|2[0-8])\9(?:02)|
(?:(?:\d*?(?:(?:0[48]|[13579][26]|[2468][048])|(?:(?:[02468][048]|[13579][26])00))|[48]00|[48])(?=\D|\b))([\/-])(?:0[1-9]|1\d|2\d)\10(?:02)|
(?:\d+)([\/-])(?:0[1-9]|1\d|2\d|30)\11(?:0[469]|11)|
(?:\d+)([\/-])(?:0[1-9]|1\d|2\d|3[01])\12(?:0[13578]|1[02])|
(?:\d+)([\/-])(?:02)\13(?:0[1-9]|1\d|2[0-8])|
(?:(?:\d*?(?:(?:0[48]|[13579][26]|[2468][048])|(?:(?:[02468][048]|[13579][26])00))|[48]00|[48])(?=\D|\b))([\/-])(?:02)\14(?:0[1-9]|1\d|2\d)|
(?:\d+)([\/-])\15(?:0[469]|11)(?:0[1-9]|1\d|2\d|30)|
(?:\d+)([\/-])(?:0[13578]|1[02])\16(?:0[1-9]|1\d|2\d|3[01])
)$

Note: The regex above uses the x (ignore whitespace) flag. This is used to keep the regex somewhat legible. If you really want a one-liner, you can use the following:

See regex in use here

^(?:(?:0[1-9]|1\d|2[0-8])([\/-])(?:02)\1(?:\d+)|(?:0[1-9]|1\d|2\d)([\/-])(?:02)\2(?:(?:\d*?(?:(?:0[48]|[13579][26]|[2468][048])|(?:(?:[02468][048]|[13579][26])00))|[48]00|[48])(?=\D|\b))|(?:0[1-9]|1\d|2\d|30)([\/-])(?:0[469]|11)\3(?:\d+)|(?:0[1-9]|1\d|2\d|3[01])([\/-])(?:0[13578]|1[02])\4(?:\d+)|(?:02)([\/-])(?:0[1-9]|1\d|2[0-8])\5(?:\d+)|(?:02)([\/-])(?:0[1-9]|1\d|2\d)\6(?:(?:\d*?(?:(?:0[48]|[13579][26]|[2468][048])|(?:(?:[02468][048]|[13579][26])00))|[48]00|[48])(?=\D|\b))|(?:0[469]|11)([\/-])(?:0[1-9]|1\d|2\d|30)\7(?:\d+)|(?:0[13578]|1[02])([\/-])(?:0[1-9]|1\d|2\d|3[01])\8(?:\d+)|(?:\d+)([\/-])(?:0[1-9]|1\d|2[0-8])\9(?:02)|(?:(?:\d*?(?:(?:0[48]|[13579][26]|[2468][048])|(?:(?:[02468][048]|[13579][26])00))|[48]00|[48])(?=\D|\b))([\/-])(?:0[1-9]|1\d|2\d)\10(?:02)|(?:\d+)([\/-])(?:0[1-9]|1\d|2\d|30)\11(?:0[469]|11)|(?:\d+)([\/-])(?:0[1-9]|1\d|2\d|3[01])\12(?:0[13578]|1[02])|(?:\d+)([\/-])(?:02)\13(?:0[1-9]|1\d|2[0-8])|(?:(?:\d*?(?:(?:0[48]|[13579][26]|[2468][048])|(?:(?:[02468][048]|[13579][26])00))|[48]00|[48])(?=\D|\b))([\/-])(?:02)\14(?:0[1-9]|1\d|2\d)|(?:\d+)([\/-])\15(?:0[469]|11)(?:0[1-9]|1\d|2\d|30)|(?:\d+)([\/-])(?:0[13578]|1[02])\16(?:0[1-9]|1\d|2\d|3[01]))$

Results

** VALID **
30/11/2017
30-11-2017
2017-05-31

** INVALID **
30-/11-/2017
30-11/2017

Explanation

I will not provide a full explanation of the entire regex here as that would literally be absurd. Instead, I'll provide you with a link to the original regex so that you can look at it and decode it using the original.

Each option's non-capturing group represents either a day, month or year. Each capture group represents a delimiter. Assuming a delimiter match; the backreference ensures the second delimiter is the same as the first.

ctwheels
  • 21,901
  • 9
  • 42
  • 77
  • Thanks for your help! I didn't realize the regex for date checking would be so complicated, granted I wasn't even considering exceptions like leap years. I'm pretty new to programming so regex is pretty overwhelming, but it seems really useful. – Joe Coon Nov 14 '17 at 02:14
  • @JoeCoon it can be useful granted it’s used correctly. What I would suggest is that you use existing methods and code if you can (assuming they are reliable). Otherwise, you can parse dates into pieces using some kind of parser (i.e. gathering information using regex or string functions and then testing the values and validating them using code). Regex alone is powerful, but it does not provide the full logic and should be used alongside another language like Java, C#, etc. I’m glad I could be of assistance, if you have any questions about the regex in this answer just let me know! – ctwheels Nov 14 '17 at 03:52