17

I would like to know how to match a date like this one "Oct 21, 2014" or "October 21, 2014"

What I have done so far is \b(?:Jan?|?:Feb?|?:Mar?|?:Apr?|?:May?|?:Jun?|?:Jul?|?:Aug?|?:Sep?|?:Oct?|?:Nov?|?:Dec?) [0-9]{1,2}[,] (?:19[7-9]\d|2\d{3})(?=\D|$) but that doesn't get me anywhere

  • On short I need my matching string to be: "Month[space]Day[comma][space]Year" and I don't care about leap years and days of month should be anything between 1 and 31 with no leading 0
  • I need this regex to work on python
faceoff
  • 901
  • 3
  • 11
  • 16
  • 1
    Can you add all possible input strings, both valid and invalid – Tushar Feb 15 '16 at 16:02
  • 1
    Also, you should add complete code not `...` – Tushar Feb 15 '16 at 16:04
  • 1
    I bet you forgot: 1) regex delimiters, 2) double the backslashes, 3) test it at [regex101.com](http://regex101.com). What language is it written in? What is the regex flavor? – Wiktor Stribiżew Feb 15 '16 at 16:05
  • @WiktorStribiżew I guess, that answers the question. – Tushar Feb 15 '16 at 16:06
  • @Tushar Actually I only need month abbreviations like "Oct 21, 2014" and everything not in this format "Month[space]Day[comma][space]Year"should be invalid. I don't care about leap years and days could be anything from 1 to 31. Note that days are just 1 and not 01. @ Wiktor I am using python for this regex and tried regex101 with no luck. This is the complete regex: \b(?:Jan?|?:Feb?|?:Mar?|?:Apr?|?:May?|?:Jun?|?:Jul?|?:Aug?|?:Sep?|?:Oct?|?:Nov?|?:Dec?) [0-9]{1,2}[,] (?:19[7-9]\d|2\d{3})(?=\D|$) – faceoff Feb 15 '16 at 16:26

2 Answers2

35

This may suffice your needs.

Keep in mind however that you will need more sophisticated validations such as validating the number of days for a specific month (say, February can have up to 28 days only (29 in bissext years), and so on)

(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\s+(\d{1,2})\s+(\d{4})

Play with it here.

Again, this is definitely a very simple regex and you must have many better solutions out there, but perhaps this may be enough to your needs, I do not know.

Cleptus
  • 3,446
  • 4
  • 28
  • 34
Veverke
  • 9,208
  • 4
  • 51
  • 95
  • It does match the complete name, due (\w+)?. See the link. I mean the validation is definitely poor, because Janaaaaa will match. Will change it. – Veverke Feb 15 '16 at 16:11
  • 1
    The alternatives that share the same prefix will slow down code execution with this regex. If the input string is short, it is ok, if it is long, it might be a problem. It is a good idea to contract it the way OP has it. – Wiktor Stribiżew Feb 15 '16 at 16:19
  • Yes Wiktor, this is definitely not performance-wise. Again, just wanted to leave the OP with something. Did not understand what you meant with "good idea to *contract* it the way OP has it". – Veverke Feb 15 '16 at 16:23
  • 3
    Instead of `Jan|January` use `Jan(?:uary)?`. – Wiktor Stribiżew Feb 15 '16 at 16:24
  • As far as I can tell, that's exactly what I needed. @Veverke Thank you! – faceoff Feb 15 '16 at 16:45
  • 1
    @WiktorStribiżew: what is the advantage of using non-capturing group in this case over capturing group, like I did ? – Veverke Jul 06 '17 at 07:04
  • 1
    @Veverke: 1) Easier to read (for me) because it is...), 2) Shorter, 3) More efficient (the best practice is that each alternative branch does not match at the same location, else, there might be too many unnecessary backtracking). – Wiktor Stribiżew Jul 06 '17 at 12:23
  • @Veverke when you get in python the groups matched from the regex if you use non-capturing groups you would get only three containing: Day, month, year. otherwise you could get extra groups with the values "uary" or "y" – Cleptus Jul 24 '19 at 10:06
  • This is a simple regex? Wow. – Alonso del Arte Feb 10 '21 at 00:23
  • 1
    @AlonsodelArte: it is a long one - but with simple regex constructs/expressions only. – Veverke Feb 10 '21 at 08:05
  • @Veverke I'm a complete novice at regex. I went to a conference talk about it once. Most of the presentation went over my head. – Alonso del Arte Feb 11 '21 at 20:27
1

The next could be used for dates with mistakes in month string with python:

"".join((re.compile('(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\.)?(\w*)?(\.)?(\s*\d{0,2}\s*),(\s*\d{4})', re.S + re.I).findall('Some wrong date is Septeme 28, 2002date') + ['n/a'])[0])

Output is:

'Septeme 28 2002'

1 group is a month star:

(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)

2-4 groups are optional suffixes of a month which could include a dot or alphanumeric characters:

(\.)?(\w*)?(\.)?

It matches ., t. tem in Sep., Sept., Septem

5 group is date number which could be or could not be, so 0 in the expression stands for dates without date number:

(\s*\d{0,2}\s*)

6 group is a year:

(\s*\d{4})

\s* stands for possible 'empty' characters (spaces, tabs and so on) from 0 to many

[0] takes the first matching if a few dates tuples in the list

+ ['n/a'] could be added as an additional list element in case if no date matched, so at least 1 element in the list would exist and no 'list index out of range' error appear when [0] element is being taken

Gryu
  • 2,102
  • 2
  • 16
  • 29