2

Here are 3 snippets from 3 different emails:

1) Subject: FW: NEFS 11 fish for lease
   From: Claire Fitz-Gerald 
   Date: 11/15/2013 3:02 PM

2) Subject: FW: NEFS 11 and 12 fish for lease
   From: Claire Fitz-Gerald 
   Date: 11/11/2013 4:09 PM

3) Subject: FW: NEFS 11 fish for lease
   From: Claire Fitz-Gerald 
   Date: 12/5/2013 4:23 PM

I am trying to capture the date from these emails, and 100's more, but can't seem to utilize RegEx correctly. For one, I am not an expert with RegEx. But I have seen similar posts on StackOverflow and tried using their code but for some reason it doesn't work for me.

My code:

with open(file_path, 'r') as f:
pattern = re.compile("(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\\d\\d")
        email = f.read()
        dates = pattern.findall(email)
        if dates:
            #print("Date:", ''.join(dates))
            print("Date:", ''.join(''.join(dates) for dates in dates))

I am confused why this RegEx seems to work for others but not me. I have also tried using much more in depth RegEx I found on SO:

re.compile("^((0?[13578]|10|12)(-|\/)(([1-9])|(0[1-9])|([12])([0-9]?)|(3[01]?))(-|\/)((19)([2-9])(\d{1})|(20)([01])(\d{1})|([8901])(\d{1}))|(0?[2469]|11)(-|\/)(([1-9])|(0[1-9])|([12])([0-9]?)|(3[0]?))(-|\/)((19)([2-9])(\d{1})|(20)([01])(\d{1})|([8901])(\d{1})))$")

I would simply like to capture the date in these emails, and then I can worry about turning them into the correct format later. Any help is appreciated, thanks.

theprowler
  • 3,138
  • 11
  • 28
  • 39

2 Answers2

2

To capture the dates you can use this code:

regex = r"Date: (\d{1,2}\/\d{1,2}\/\d{4})"

Check online demo.

Graham
  • 7,431
  • 18
  • 59
  • 84
Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
  • That worked almost perfectly. It also captured the time though, is there a way to exclude that? – theprowler Dec 09 '16 at 17:54
  • That is perfect. Thank you so much. Just one quick question: did you come up with that answer so quickly because you're just an expert with RegEx? Or did you Google it? Or something else? I'm just curious for my own purposes at getting better – theprowler Dec 09 '16 at 17:59
  • 1
    @theprowler both. I'm not bad at regex, and matching a date is not that hard to find. I added a link you should look at – Thomas Ayoub Dec 09 '16 at 18:02
2

I would recommend instead pick out the Date: lines, and grab the string after Date: to end of line, then use a date parser library like Parse date strings?

Community
  • 1
  • 1
Alex G Rice
  • 1,561
  • 11
  • 16
  • I actually tried that previously but couldn't succeed with it, that's why I switched to trying RegEx. If the other guy's answer below doesn't work out though I'll try this way again – theprowler Dec 09 '16 at 17:55
  • 1
    If there is any possibility your dates will be differently formatted or change in the future, then `dateutil.parser` could offer some additional safety. Otherwise the regex should totally be sufficient. :) – Alex G Rice Dec 09 '16 at 17:58
  • That is very good to know in case a different format does arise. Thanks so much – theprowler Dec 09 '16 at 18:00