1

I want to match 13/12/2015 (day, month, year in separate) of:

ASTA n° 30 | 13/12/2015 ore 10.00 | Arte Moderna & Contemporanea

With this Regex (PHP - preg_match):

/(\d{1,2})\D{1,4}(\d{1,2})\D{1,4}(\d{4}|\d{2})/imu

I got:

30 | 13/12

But I need 13/12/2015. Seems like Regex is not greedy enough... I know that the match I got is possible with my Regex, but I want to prefer the \d{4} over \d{2} (in the last round bracket).

EDIT: I need the \d{2} and \D{1,4} parts to be more flexible (there are dates like 13.10.15 or 13th 12.2015, etc.). Is there a way to reverse processing order of regex engine (end-to-start)? So it will first match \d{4} and then \d{2} (month and day)?

koseduhemak
  • 523
  • 2
  • 4
  • 19
  • the general approach is extract the date using regex, without checking if the date is valid or not, then validate it using proper methods, it is much more reliable, especially regex wont check many aspects like 29.02 :P – mikus Dec 11 '15 at 12:50
  • yeah i know ;) I check against 40 different languages, and 30 different placing patterns (like 2015-20-10, 2th December 2015, 2. 3. and 4. September 2015, etc.) My rules try to extract a date which is then validated using parsing of DateTime class... – koseduhemak Dec 12 '15 at 13:28

2 Answers2

1

Why not simply this: \d{2}\/\d{2}\/\d{4} (click regex for demo).

You have to digits, a slash, two digits, again a slash and 4 digits.

If you want to add support for single digits and for example hyphens you can do this: \d{1,2}[\/-]\d{1,2}[\/-]\d{4} (again, click regex for demo).

Updated as per OP's request to also match two-digit year:

(\d{1,2}[\/-]\d{1,2}[\/-](?:\d{4}|\d{2})) DEMO

This regex adds a check, if it cannot find a 4-digit year it will look for 2 digits only.

Edit 2: I shortened the regex a little bit - now the date and month are in same regex non-capturing group, but will match only if there are two occurences of this regex. Withour further ado, the regex:

((?:\d{1,2}[\/-]){2}(?:\d{4}|\d{2})) DEMO

Asunez
  • 2,327
  • 1
  • 23
  • 46
  • Sometimes there is a date with year only having two digits... Which should also get matched... Something like 13.10.15... But if there is a year having 4 digits, that case should be prefered. – koseduhemak Dec 12 '15 at 13:26
  • @mfuesslin I have updated my answer to match also 2-digit years. Remember to mark an accepted answer if it answers your problem. – Asunez Dec 14 '15 at 08:27
  • Your approach is nice! Is there a way to do it with \D instead of character class [\/-] ? I have accepted your answer, because it does what I want (but I still have to explore what I have to put in the character class to match all my date variants). The \D thing is just for more flexibility and because I cant go through 250000 dates to get all needed characters for character class ("date-dividers") – koseduhemak Dec 14 '15 at 21:12
  • @mfuesslin You can of course use the `\D`, but this will allow for way too many invalid options, such as letters and symbols not used in dates - I guess you do not want to allow dates in this format: `12$05&15` ;) Character class is the easiest though. You can shorten it a little bit by allowing day and month to be the same regexp with only one character class for date-dividers, as you call them. However, this all assumes the year comes last no matter what. I will edit my answer to add this in too. – Asunez Dec 15 '15 at 07:03
0

Use

(\d{1,2})\/(\d{1,2})\/(\d{4}|\d{2})

nkit
  • 156
  • 4
  • \D matched any non- digit, that is causing problem – nkit Dec 11 '15 at 12:46
  • sorry, it was an copy-paste mistake. Thanks @Asunez – nkit Dec 11 '15 at 13:11
  • This will match it - of course. But i need a more general approach because I have to match 40 different languages / multiple date formats... Something like 13.12.2015, 13-12-2015, 13.12.15, 13/12/15, etc. I need some flexibility, therefore I chose \D. I was just wondering if there is a mechanism which allows me to prefer a matching group... Maybe it is possible to reverse regex parsing by regex engine? So it would try to parse from end-to-start of the string? I think it matches the wrong way because it is the first solution the engine finds... – koseduhemak Dec 12 '15 at 13:34