-2

I am trying to extract a list of emails from a given text. Most of emails has the following syntax:

 "Last_name, First_Name (First-name)" <last_name.first_name@domain.xxx>
or
"Last_name, First_Name (XXXX)" <last_name.first_name@domain.xxx>

My goal is to extract the whole emails including the first part, meaning the "Last_name, First_Name (XXXX)".

To extract the list of emails, I have used the following regex:

"(<?[a-z0-9!#$%&*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
"{|}~-]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
"\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9]>?)?)"

which extract only the emails without the first part. Meaning that extract only the:

<last_name.first_name@domain.xxx>

I have tried several variations of the regex to extract the first part but unfortunately they doesn't work.

Please do not hesitate If you have any suggestion. Thank you in advance.

Djo
  • 55
  • 3
  • 1
    Are the email strings located inside `<...>`? If so, just extract `<.*?>`? – cs95 Oct 02 '17 at 15:03
  • 1
    Do you have sample email from which the email information should be extracted? If so, post the full input here and let us know exactly what the output should be (assuming multiple inputs, since there seem to be multiple formats). Also, are you trying to capture into groups? Do you want the first name, last name and email? And in which format do you want the latter? – ctwheels Oct 02 '17 at 15:04

2 Answers2

0

First, check that link where you can test your regex with a nice memo around it

https://regex101.com

Then, something like

"[a-zA-Z_]+, [a-zA-Z_( )]+"

Should capture the first Part, maybe be you can give us some more testing text ?

  • 1
    This is barely an answer: See https://stackoverflow.com/help/how-to-answer. Seeing as how the OP did not provide enough context to answer the question, this should really only be a comment – ctwheels Oct 02 '17 at 15:11
  • Stacks, does ot allow me to comment main post ): – Fabrice Palermo Oct 02 '17 at 15:26
0
 >>> import re
 >>>
 >>> emailLine='"Last_name, First_Name (First-name)" <last_name.first_name@domain.xxx>'
 >>>
 >>> re.findall('^\"([^,]*?),\s([^"]*?)"\s<([^>]*?)>',emailLine)

 [('Last_name', 'First_Name (First-name)', 'last_name.first_name@domain.xxx')]
  • Thank you but I can have more than one email per line in my case..I tried with your regex but it returns only the first occurrence. – Djo Oct 02 '17 at 15:41
  • If each line has the same pattern then you need to iterate over the lines one by one. Convert the file to list. [See this](https://stackoverflow.com/questions/3925614/how-do-you-read-a-file-into-a-list-in-python) – Pandian Muninathan Oct 02 '17 at 16:19
  • Use the below regex(only if all occurances are in same pattern) to match anywhere in the file if you don't want to split into lines. \"([^,]*?),\s([^\"]*?)\"\s<([^>]*?)> – Pandian Muninathan Oct 02 '17 at 16:24