0

I have date in my .txt file which comes like either of the below:

mmddyyyy

OR

mm/dd/yyyy

Below is the regex which works fine for mm/dd/yyyy.

^02\/(?:[01]\d|2\d)\/(?:19|20)(?:0[048]|[13579][26]|[2468][048])|(?:0[13578]|10|12)\/(?:[0-2]\d|3[01])\/(?:19|20)\d{2}|(?:0[469]|11)\/(?:[0-2]\d|30)\/(?:19|20)\d{2}|02\/(?:[0-1]\d|2[0-8])\/(?:19|20)\d{2}$

However, unable to build the regex for mmddyyyy. I just want to understand is there any generic regex that would work for both cases?

whatsinthename
  • 1,828
  • 20
  • 59

3 Answers3

2

Why use regex for this? Seems like a case of "Now you have two problems"

It would be more effective (and easier to understand) to use a DateTimeFormatter (assuming you are on the JVM and not using scala-js)

The format patterns support using [] to surround optional sections, such as the /, and the formatters inherently perform input validation so if you plug in a month or day that can't exist, it'll throw an exception.

import java.time.format.DateTimeFormatter
import java.time.LocalDate

val mdy = DateTimeFormatter.ofPattern("MM[/]dd[/]yyyy")
def parse(rawDate: String) = LocalDate.parse(rawDate, mdy)
scala> parse("12252022")
res7: java.time.LocalDate = 2022-12-25

scala> parse("12/25/2022")
res8: java.time.LocalDate = 2022-12-25

scala> parse("25/12/2022")
java.time.format.DateTimeParseException: Text '25/12/2022' could not be parsed: Invalid value for MonthOfYear (valid values 1 - 12): 25

scala> parse("abc123")
java.time.format.DateTimeParseException: Text 'abc123' could not be parsed at index 0
Dylan
  • 13,645
  • 3
  • 40
  • 67
  • it will be more efficient to first check on `/` symbol. `Try` (actually, `throw Exception`) is a simple, but not an efficient way. https://stackoverflow.com/a/16451908/5122436 – Mikhail Ionkin Jul 16 '22 at 11:14
  • @MikhailIonkin I've updated the answer to take advantage of optional sections in the formatter pattern, so it no longer needs to use `try` or `Try`. – Dylan Jul 16 '22 at 13:25
1

If you want to match all those variations with either 2 forward slashes or only digits, you can use a positive lookahead to assert either only digits or 2 forward slashes surrounded by digits.

Then in the pattern itself you can make matching the / optional.

Note that you don't have to escape the \/

^(?=\d+(?:/\d+/\d+)?$)(?:02/?(?:[01]\d|2\d)/?(?:19|20)(?:0[048]|[13579][26]|[2468][048])|(?:0[13578]|10|12)/?(?:[0-2]\d|3[01])/?(?:19|20)\d{2}|(?:0[469]|11)/?(?:[0-2]\d|30)/?(?:19|20)\d{2}|02/?(?:[0-1]\d|2[0-8])\?(?:19|20)\d{2})$

Regex demo

Another option is to write an alternation | matching the same pattern without the / in it.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

First of all, there is a tiny shortcoming in your regex: the ^ anchor only applies to the first part of your regex, not to the other alternatives that are separated by |. Similarly the final $ applies only to the final alternative. You should put all alternatives in a non-capturing group, like ^(?: | | | )$

Then for the question itself, you could make the forward slash that follows the month optional and put it in a capture group. Then what comes between the day and the year could be a backreference to that capture group. So (\/?) and \1.

^(?:02(\/?)(?:[01]\d|2\d)\1(?:19|20)(?:0[048]|[13579][26]|[2468][048])|(?:0[13578]|10|12)(\/?)(?:[0-2]\d|3[01])\2(?:19|20)\d{2}|(?:0[469]|11)(\/?)(?:[0-2]\d|30)\3(?:19|20)\d{2}|02(\/?)(?:[0-1]\d|2[0-8])\4(?:19|20)\d{2})$
trincot
  • 317,000
  • 35
  • 244
  • 286