I am doing a small script, which should read my worked time from my email and save how much time I already worked. It is doing this through regex. Now here is my Script:
import imaplib import re from pprint import pprint
mail = imaplib.IMAP4_SSL('imap.gmail.com',993)
mail.login('*************', '**************')
# Out: list of "folders" aka labels in gmail.
mail.select("inbox") # connect to inbox.
typ, data = mail.search(None, 'SUBJECT', 'Zeiterfassung')
worked_time_pattern = re.compile(r'"(?P<time>\d+(,\d)?)"[^>]*?selected[^>]*>=?(\r?\n?)(?P=time)<')
# old version: worked_time_pattern = re.compile(r'\"(?P<time>[0-9]+(?:[,][0-9])?)\"(?: disabled)? selected(?: disabled)? style=3D"">[=]?[\n]?(?P=time)<\/option>')
date_pattern = re.compile('.*Date: [a-zA-Z]{1,4}[,] (?P<date>[0-9]{1,2} [a-zA-Z]{1,4} [0-9]{4}).*', re.DOTALL)
count = 0
countFail = 0
if 'OK' == typ:
for num in data[0].split():
typ, data = mail.fetch(num, '(RFC822)')
mailbody = "".join(data[0][1].split("=\r\n"))
mailbody = "".join(mailbody.split("\r"))
mailbody = "".join(mailbody.split("\n"))
worked_time = worked_time_pattern.search(data[0][1])
date = date_pattern.match(data[0][1])
if worked_time != None:
print worked_time.group('time')
count = count + 1
else:
print mailbody
countFail = countFail + 1
print worked_time
print "You worked on %s\n" % ( date.group('date'))
#print 'Message %s\n%s\n' % (num, data[0][1])
print count
print countFail
mail.close()
mail.logout()
the problem is, it returns None
for worked_time for some of my strings (not all, more than a half works [23 works, 8 not]), which means that the pattern is not matched. I tested it with most online regex testers, and they all told me, that the pattern matches and everything fine..
here a few example strings that weren't accepted but are by online tools, e.g. http://regex101.com
pasted them, because it they are big and ugly: http://pastebin.com/4Z2BdmXk http://pastebin.com/dMxcRqQu
btw the regex for date works fine on all (but not on the pasted string I had to cut away the upper part because of a lot of private information)
worked_time_pattern should search for something like: "1,5" disabled selected style=3D"">1,5</option>
(and get the 1,5
out of it, exaclty as it does on half of the cases...)
Anybody any idea?