-1

I want to capture Date format - yyyy/mm/dd hh:mm

[^\n\r]*[\r\n]+([12]\d{3}/(0[1-9]|1[0-2])/(0[1-9]|[12]\d|3[0‌​1]))** 

The above expression captures the next line up to the day, but I want to include time part as well and would also like to capture single digits for month and day and time instead of having to enter two digits.

For eg. User could enter 2017/5/2 9:5 or 2017/05/02 09:05

Need help with capture of single digits Month and day as well as time part.

Sidh Rg
  • 3
  • 1
  • 7
  • 1
    And your question is? – shmosel Oct 23 '17 at 16:12
  • If you want to match single digits then remove that requirement for single-digit values to start with 0. Also what's the problem with expanding the expression to match the time as well? You already built quite a complex expression and expanding it a little shouldn't be a problem in that case. – Thomas Oct 23 '17 at 16:15
  • @shmosel just updated it. – Sidh Rg Oct 23 '17 at 16:17
  • @Thomas When I remove the 0, it doesn't capture at all. Like if I did 01 instead of 1. – Sidh Rg Oct 23 '17 at 16:24
  • 3
    Still not seeing a specific question. Have you made any attempt on your own, or were you hoping we'd do all the work for you? – shmosel Oct 23 '17 at 16:24
  • 3
    Don't just remove the `0` but make it optional. – Thomas Oct 23 '17 at 16:25
  • 2
    If you're *not* doing this just for regex-learning purposes, it's better to use a proper API for date parsing: https://stackoverflow.com/q/4216745/7605325 –  Oct 23 '17 at 16:31

2 Answers2

0

To make a digit optional, just use the ? quantifier.

Assuming you're using the classes Pattern and Matcher from java.util.regex package, your code will be like this (also note that in Java you must escape the backslash, so the pattern \d must be written as \\d):

String input = "2017/5/2 9:5";

Pattern pattern = Pattern.compile("(\\d{4})/(0?[1-9]|1[0-2])/(0?[1-9]|[12]\\d|3[0‌​1]) ([01]?\\d|2[0-3]):([0-5]?\\d)");

Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
    String year = matcher.group(1);
    String month = matcher.group(2);
    String day = matcher.group(3);
    String hour = matcher.group(4);
    String minute = matcher.group(5);
}

For the month and day, I just added a ? after the zero, to make it optional.

For the hour, I did:

  • [01]?: optional zero or one, followed by any digit (\d), or
  • 2[0-3]: number 2, followed by 0, 1, 2 or 3 (so it gets hours from 20 to 23)

And for the minutes:

  • [0-5]?: optional digit from 0 to 5
  • followed by any digit (\d)

This also works when the input has zeros, such as "2017/05/02 09:05". You can optionally convert the String values to int, using Integer.parseInt(matcher.group(1)).


Why not use a date/time API?

The code above doesn't check for all cases of valid dates, such as the number of days in a month (including February in leap years). Although it's possible to do it with a regex, it'll be so complex and hard to maintain, that IMO it's much better to use a proper API for that (just the leap year validation is a very complex expression by itself).

If you're doing this code just for learning purposes, then it's fine. But for real business applications, it's better to use a date/time API (regex are great, but not always the best tool for everything).

If you're using Java 8, consider using the new java.time API. It's easier, less bugged and less error-prone than the old APIs.

If you're using Java 6 or 7, you can use the ThreeTen Backport, a great backport for Java 8's new date/time classes. And for Android, you'll also need the ThreeTenABP (more on how to use it here).

The code below works for both. The only difference is the package names (in Java 8 is java.time and in ThreeTen Backport (or Android's ThreeTenABP) is org.threeten.bp), but the classes and methods names are the same.

First you can use a DateTimeFormatter and parse the input to a LocalDateTime (a class that represents a date and a time, which is a perfect match to your input data). Then you use this class to get the fields you want:

String input = "2017/5/2 9:5";

// pattern with optional zero for month, day, hour and minute
DateTimeFormatter fmt = DateTimeFormatter.ofPattern("yyyy/M/d H:m")
    // use strict mode to validate dates like Feb 29th
    .withResolverStyle(ResolverStyle.STRICT);
LocalDateTime dt = LocalDateTime.parse(input, fmt);
int year = dt.getYear();
int month = dt.getMonthValue();
int day = dt.getDayOfMonth();
int hour = dt.getHour();
int minute = dt.getMinute();

This also works for "2017/05/02 09:05" as well. This also has the advantage of checking for invalid values (like months > 12, or Feb 29th in leap years and so on).

If you don't use strict mode, Feb 29th is adjusted to Feb 28th in non-leap years (it's the behaviour of smart resolver style, which is the default).

Check the javadoc for all available patterns accepted by DateTimeFormatter.

0

Here it is...

  \d{4}\/([1-9]{1}|0[1-9]|1[0-2])\/([1-9]{1}|[0-2]{1}[1-9]{1}|3[0-1])\s+([0-9]{1}|[0-1]{1}[0-9]{1}|2[0-4]):([0-9]{1}|[0-5]{1}[0-9]{1})\s+

This may seems overwhelming, so here is a walkthrough the expression. This expression will not only take find the date and time but also ingnore the unrealistic date-time such as 2001/44/44 or 2344/44444/999. It checks for valid date-time only. Invalid date-time will be ignored.Also it will not just check date-time at beginning of line but anywhere in the string wheter the string the single line or multiple lines.

Explanation

1st 4 digits will be year....

\d{4}

followed by '/'...

\d{4}\/

Now, month can be in single digit like 1-9

\d{4}\/( [1-9]{1} )

or in double digits 01, 02, 03, 09 ( remember here if a month start with 0,then its 2nd digit cannot be greater than 9.)

\d{4}\/( [1-9]{1} | 0[1-9]{1} )

or 10, 11, 12 but cannot be greater than 12.

\d{4}\/( [1-9]{1} | 0[1-9]{1} | 1[0-2]{1} )

followed by a '/'

\d{4}\/( [1-9]{1} | 0[1-9]{1} | 1[0-2]{1} ) \/

Now comes days, it can be single digit 1-9

\d{4}\/( [1-9]{1} | 0[1-9]{1} | 1[0-2]{1} ) \/( [1-9]{1} )

or double digit 01, 02, 03, 09, 19 , 29.

\d{4}\/( [1-9]{1} | 0[1-9]{1} | 1[0-2]{1} ) \/( [1-9]{1} | [0-2]{1}[1-9]{1} )

or it can be 30 or 31 but not greater than that.

\d{4}\/( [1-9]{1} | 0[1-9]{1} | 1[0-2]{1} ) \/( [1-9]{1} | [0-2]{1}[1-9]{1} | 3[0-1] )

Now the date part is done. Some space between date and time.

\d{4}\/( [1-9]{1} | 0[1-9]{1} | 1[0-2]{1} ) \/( [1-9]{1} | [0-2]{1}[1-9]{1} | 3[0-1] ) \s+

Now let focus on time part. Assuming time is based on 24hr format. Hour can be single digit like 0, 1, 2, 9

( [0-9]{1} )

or double digit like 01, 02, 09, 11, 19

( [0-9]{1} | [0-1]{1}[0-9]{1} )

or 20, 21, 22, 23, 24 but not greater than 24.

( [0-9]{1} | [0-1]{1}[0-9]{1} | 2[0-4]{1} )

followed by ':'

( [0-9]{1} | [0-1]{1}[0-9]{1} | 2[0-4]{1} ) : 

Minutes can be in single digit like 0, 1, 2, 9...

( [0-9]{1} | [0-1]{1}[0-9]{1} | 2[0-4]{1} ) : ( [0-9]{1} )

or double digit like 01, 02, 03, 23, 44, 59 (not 60).

( [0-9]{1} | [0-1]{1}[0-9]{1} | 2[0-4]{1} ) : ( [0-9]{1} | [0-5]{1}[0-9]{1} )

followed by some space

( [0-9]{1} | [0-1]{1}[0-9]{1} | 2[0-4]{1} ) : ( [0-9]{1} | [0-5]{1}[0-9]{1} )
\s+

Now combine your Date Regex and Time Regex and you will get

\d{4}\/([1-9]{1}|0[1-9]|1[0-2])\/([1-9]{1}|[0-2]{1}[1-9]{1}|3[0-1])\s+([0-9]{1}|[0-1]{1}[0-9]{1}|2[0-4]):([0-9]{1}|[0-5]{1}[0-9]{1})\s+

NOTE: During the explanation, i have added extra space in the Regex just for better readability.

Shadab Faiz
  • 2,380
  • 1
  • 18
  • 28