I would like to capture groups based on a consecutive occurrence of matched groups in any order. And when one set type is repeated without the alternative set type, the alternative set is returned as nil.
I am trying to extract names and emails based on the following regex:
For names, two consecutive capitalized words:
[A-Z][\w]+\s+[A-Z][\w]+
For emails:
\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b
Example text:
John Doe john@doe.com random text
Jane Doe random text jane@doe.com
jim@doe.com more random text tim@doe.com Tim Doe
So far I have used non-capture groups and positive look aheads to tackle the "in-no-particular-order-or-even-present" problem but only managed to do so by segmenting by newlines. So my regex looks like this:
^(?=(?:.*([A-Z][\w]+\s+[A-Z][\w]+))?)(?=(?:.*(\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b))?).*
And the results miss items where there are multiple contacts on the same line:
[
["John Doe", "john@doe.com"],
["Jane Doe", "jane@doe.com"],
["Tim Doe", "tim@doe.com"],
]
When what I'm looking for is:
[
["John Doe", "john@doe.com"],
["Jane Doe", "jane@doe.com"],
[nil, "jim@doe.com"],
["Tim Doe", "tim@doe.com"],
]
My skills in regex are limited and I started using regex because it seemed like the best tool for matching names and emails.
Is regex the best tool to use for this kind of problem or are there more efficient alternatives using loops if we're extracting hundreds of contacts in this manner?