I have a regex which detects the date of birth in the given paragraph text.
import re
dob = re.compile(r'(?:\bbirth\b|\bbirth(?:day|date)).{0,20}\n? \b((?:(?<!\:)(?<!\:\d)[0-3]?\d(?:st|nd|rd|th)?\s+(?:of\s+)?(?:jan\.?|january|feb\.?|february|mar\.?|march|apr\.?|april|may|jun\.?|june|jul\.?|july|aug\.?|august|sep\.?|september|oct\.?|october|nov\.?|november|dec\.?|december)|(?:jan\.?|january|feb\.?|february|mar\.?|march|apr\.?|april|may|jun\.?|june|jul\.?|july|aug\.?|august|sep\.?|september|oct\.?|october|nov\.?|november|dec\.?|december)\s+(?<!\:)(?<!\:\d)[0-3]?\d(?:st|nd|rd|th)?)(?:\,)?\s*(?:\d{4})?|\b[0-3]?\d[-\./][0-3]?\d[-\./]\d{2,4})\b',re.IGNORECASE | re.MULTILINE)
data = " Hi This is Goku and my birthday is on 6th Aug but to be clear it is on 1994-08-06."
l = dob.findall(data)
print(l)
o/p: ['6th Aug ']
I just want to add one more feature like if something in this format YYYY-MM-DD is present in the text, then that should also be the date of birth.
(where YYYY --> 19XX-20XX , MM --> 01-12 , DD --> 01-31)
For Ex:
data = " Hi This is Goku and my birthday is on 6th Aug but to be clear it is on 1994-08-06."
Then the output should be
output: ['6th Aug ', '1994-08-06']
where can i add the part in the regex so it would detect this YYYY-MM-DD format also.??