0

How can I extract only email addresses from a long string in Golang? For example:

"a bunch of irrelevant text fjewiwofjfjvnvkdlslsosiejwoqlwpwpwo
 mail=jim.halpert@gmail.com,ou=f,c=US
 mail=apple.pie@gmail.com,ou=f,c=US
 mail=hello.world@gmail.com,ou=f,c=US
 mail=alex.alex@gmail.com,ou=f,c=US
 mail=bob.jim@gmail.com,ou=people,ou=f,c=US
 mail=arnold.schwarzenegger@gmail.com,ou=f,c=US"

This would return a list of all the emails: [jim.halpert@gmail.com, apple.pie@gmail.com, etc...]

Each email address would begin with "mail=" and end with a comma ",".

  • 1
    What have you tried? What problems do you have? Post your attempt, preferably a [mcve]. – icza Jun 02 '22 at 06:19
  • 1
    Is this from an x509 (TLS) certificate? If so, get a x509 parser library to extract the mail fields. Email addresses are notoriously difficult to apply filtering rules on, due to all the different ways they allow quoting and escaping. It's nearly impossible to reliably extract an email address from arbitrary word soup. The reliable method is to parse the actual container format the email address is contained within. So if this comes from a x509 cert, then you want to parse the cert, *not the email addresses*! – datenwolf Jun 02 '22 at 07:21

4 Answers4

0

For this you need to breakdown the long go string into parts that you need. You can do filtration and searching using Regular Expressions to match the email pattern you see above.

Here's a piece of code using Regular Expressions to first obtain the section with "mail=" then further format the email removing the trailing ,

 import (
    "fmt"
    "regexp"
    "strings"
)

func main() {
    var re = regexp.MustCompile(`(?m)mail=[A-Za-z.@0-9]+\,`)
    var str = `a bunch of irrelevant text fjewiwofjfjvnvkdlslsosiejwoqlwpwpwo
 mail=jim.halpert@gmail.com,ou=f,c=US
 mail=apple.pie@gmail.com,ou=f,c=US
 mail=hello.world@gmail.com,ou=f,c=US
 mail=alex.alex@gmail.com,ou=f,c=US
 mail=bob.jim@gmail.com,ou=people,ou=f,c=US
 mail=arnold.schwarzenegger@gmail.com,ou=f,c=US`

    for i, match := range re.FindAllString(str, -1) {
        fmt.Println(match, "found at index", i)
        email := strings.Split(match, "=")[1]

        email = strings.ReplaceAll(email, ",", "")

        fmt.Print(email)
    }
}
Daniel Kiptoon
  • 314
  • 1
  • 7
0

while i agree with the comment from user datenwolf here is another version which does not involve regular expressions.

It also handle more complex emails format including comma within the local parts. Something uneasy to implement using regexp.

see https://stackoverflow.com/a/2049510/11892070


import (
    "bufio"
    "fmt"
    "strings"
)

var str = `a bunch of irrelevant text fjewiwofjfjvnvkdlslsosiejwoqlwpwpwo
mail=jim.halpert@gmail.com,ou=f,c=US
mail=apple.pie@gmail.com,ou=f,c=US
mail=hello.world@gmail.com,ou=f,c=US
mail=alex.alex@gmail.com,ou=f,c=US
mail=bob.jim@gmail.com,ou=people,ou=f,c=US
mail=arnold.schwarzenegger@gmail.com,ou=f,c=US
mail=(comented)arnold.schwarzenegger@gmail.com,ou=f,c=US
mail="(with comma inside)arnold,schwarzenegger@gmail.com",ou=f,c=US
mail=nocommaatall@gmail.com`

func main() {

    var emails []string

    sc := bufio.NewScanner(strings.NewReader(str))

    for sc.Scan() {
        t := sc.Text()
        if !strings.HasPrefix(t, "mail=") {
            continue
        }
        t = t[5:]

        // Lookup for the next comma after the @.
        at := strings.Index(t, "@")
        comma := strings.Index(t[at:], ",")
        if comma < 0 {
            email := strings.TrimSpace(t)
            emails = append(emails, email)
            continue
        }
        comma += at
        email := strings.TrimSpace(t[:comma])
        emails = append(emails, email)
    }

    for _, e := range emails {
        fmt.Println(e)
    }

}
0

You can use this package to do that :

https://github.com/hamidteimouri/htutils/blob/main/htregex/htregex.go


// Emails finds all email strings
func Emails(text string) []string {
    return match(text, EmailsRegex)
}

Hamid Teimouri
  • 462
  • 4
  • 8
0

you can use an original package from golang is regexp.Compile or regexp.MustCompile

r, _ := regexp.Compile(regexEmail)
    newVariable := `a bunch of irrelevant text fjewiwofjfjvnvkdlslsosiejwoqlwpwpwo
 mail=jim.halpert@gmail.com,ou=f,c=US
 mail=apple.pie@gmail.com,ou=f,c=US
 mail=hello.world@gmail.com,ou=f,c=US
 mail=alex.alex@gmail.com,ou=f,c=US
 mail=bob.jim@gmail.com,ou=people,ou=f,c=US
 mail=arnold.schwarzenegger@gmail.com,ou=f,c=US`

    fmt.Printf("%#v\n", r.FindStringSubmatch(newVariable))
    fmt.Printf("%#v\n", r.SubexpNames())