I've got a file that has a ton of text in it. Some of it looks like this:
X-DSPAM-Processed: Fri Jan 4 18:10:48 2008
X-DSPAM-Confidence: 0.6178
X-DSPAM-Probability: 0.0000
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39771
Author: louis@media.berkeley.edu
Date: 2008-01-04 18:08:50 -0500 (Fri, 04 Jan 2008)
New Revision: 39771
Modified:
bspace/site-manage/sakai_2-4-x/site-manage-tool/tool/src/bundle/sitesetupgeneric.properties
bspace/site-manage/sakai_2-4-x/site-manage-tool/tool/src/java/org/sakaiproject/site/tool/SiteAction.java
Log:
BSP-1415 New (Guest) user Notification
I need to pull out only dates that follow this pattern:
2008-01-04 18:08:50 -0500
Here's what I tried:
import re
text = open('mbox-short.txt')
for line in text:
dates = re.compile('\d{4}(?P<sep>[-/])\d{2}(?P=sep)\d{2}\s\d{2}:\d{2}:]\d{2}\s[-/]\d{4}')
print(dates)
text.close()
The return I got was hundreds of:
\d{4}(?P<sep>[-/])\d{2}(?P=sep)\d{2}\s\d{2}:\d{2}:]\d{2}\s[-/]\d{4}