0

I have been learning a lot about regex lately, and am just encountering the groupdict method for re.match objects. I am trying to create one from the following email header:

EMAIL_HEADER = """Return-Path: <bounces+5555-7602-redacted-info>
...
Received: by 10.8.49.86 with SMTP id mf9.22328.51C1E5CDF
    Wed, 19 Jun 2013 17:09:33 +0000 (UTC)
Received: from NzI3MDQ (174.37.77.208-static.reverse.softlayer.com [174.37.77.208])
by mi22.sendgrid.net (SG) with HTTP id 13f5d69ac61.41fe.2cc1d0b
for <redacted-info>; Wed, 19 Jun 2013 12:09:33 -0500 (CST)
Content-Type: multipart/alternative;
boundary="===============8730907547464832727=="
MIME-Version: 1.0
From: redacted-address
To: redacted-address
Subject: A Test From SendGrid
Message-ID: <1371661773.974270694268263@mf9.sendgrid.net>
Date: Wed, 19 Jun 2013 17:09:33 +0000 (UTC)
X-SG-EID: P3IPuU2e1Ijn5xEegYUQ...
X-SendGrid-Contentd-ID: {"test_id":"1371661776"}"""

I am looking to match the "From," "To," "Subject" and "Date" lines, and turn it into a groupdict. Trying to start small and build up, I used details = re.search(r'(?P<from>(?<=From: )[a-z]+-[a-z]+)|(?P<to>(?<=To: )[a-z]+-[a-z]+)',header).groupdict()

This returns: {'from': 'redacted-address', 'to': None}

If I remove the |, I get an error that essentially means my regex did not match at all. Can anyone explain what is happening to me? I don't understand why removing the pipe character essentially returns None. I saw examples where they did not use the pipe character; they looked like what I have above without the pipe between. Any tips or help would be appreciated. Thanks!

J. B.
  • 155
  • 1
  • 8
  • Not an answer but regex101.com is SUPER helpful for me when trying to figure out patterns. Here is your example; https://regex101.com/r/Dtl3le/1 – Marcel Wilson Nov 10 '20 at 17:50
  • 1
    Use `import email` to parse emails. – Wiktor Stribiżew Nov 10 '20 at 17:54
  • With the pipe character, the regex matches *either* From or To. Without it, it tries to match both, immediately adjacent to each other - but they aren't, there are about 5 characters (newline followed by "To: ") between the end of the first group and the start of the second. You'd have to replace the pipe by something like `.*` (and turn on the `DOTALL` option) to allow the regex to skip those characters in between the groups. – jasonharper Nov 10 '20 at 18:01
  • jasonharper, that was super helpful and just what I needed. Thanks so much! And thanks to everyone else who answered as well. I appreciate it! – J. B. Nov 10 '20 at 19:38

0 Answers0