0

I want to extract email " myemail [at] domainemail [dot] com " from html code in string.

So I used this code but it is not working. What should I do ?

public static List<string> Fetch_Emails(string Sourcecode)
{
    List<string> Emails = new List<string>();    

    Regex exp = new Regex("\\b[A-Z0-9._%+-]+(\\[at\\])[A-Z0-9.-]+(\\[dot\\])[A-Z]{2,4}\\b", RegexOptions.IgnoreCase);
           MatchCollection matchCollection = exp.Matches(Sourcecode);

    foreach (Match m in matchCollection)
    {
        if (!Emails.Contains(m.Value))
        { 
            Emails.Add(m.Value);                        
        }
    }

    return Emails;
}    
andyb
  • 43,435
  • 12
  • 121
  • 150
zeynab farzaneh
  • 529
  • 4
  • 20

2 Answers2

2

Don't use regex to process emails. Email RFC defines some quite complicated rules for emails.

Rather use MailAddres class and wrap constructor in try catch. Leave heavy lifting of parsing mail addresses to .NET FCL.

If constructor of MailAddress class did not fail, then you have a regular email address and you can extract various email parts.

pero
  • 4,169
  • 26
  • 27
1

Your pattern does not support having spaces between the email components and [at] or [dot].

To add support for spaces, use [ ]{0,3} to allow between 0 and 3 spaces between the components.

\b[A-Z0-9._%+-]+[ ]{0,3}(\[at\])[ ]{0,3}[A-Z0-9.-]+[ ]{0,3}(\[dot\])[ ]{0,3}[A-Z]{2,4}\b

Also, instead of escaping the regex, use a C# string literal:

Regex exp = new Regex(@"\b[A-Z0-9._%+-]+[ ]{0,3}(\[at\])[ ]{0,3}[A-Z0-9.-]+[ ]{0,3}(\[dot\])[ ]{0,3}[A-Z]{2,4}\b", RegexOptions.IgnoreCase);
Mitch
  • 21,223
  • 6
  • 63
  • 86