0

I have a list of email, for example

johnsmith at gmail dot com
username at gmail.com
random atsign outlook dot com

The username and the provider is always separated by a custom word between spaces. The problem here is that the domain can have a custom separator like this (dot, or any text) OR just a dot, like gmail.com If it would have only spaces, I would simply read the lines and split them at the spaces, then write the first, @, the third, . and then the fifth items from the list. However, the possible john at gmail.com format is problematic for me. How could I handle this format along with the simple name at gmail dot com formats in one script?

  • [1] what have you tried so far & how did it fail to do what you need? [2] please provide at least one sample data point for your various possible inputs. [3] please provide the desired output for each sample input & the logic to get from input to output. – Lee_Dailey Dec 08 '19 at 15:32

2 Answers2

1

For the examples you give, a bit of regex will do it:

$emails = @"
johnsmith at gmail dot com
username at gmail.com
random atsign outlook dot com
"@ -split '\r?\n'

$emails | ForEach-Object {
    # replace all repeating whitespace characters by a single space
    # and split 3 parts 
    $pieces = $_ -replace '\s+', ' ' -split ' ', 3
    # output the username, followed by the '@' sign, followed by the domain
    '{0}@{1}' -f $pieces[0], ($pieces[2] -replace ' [^\.]+ ', '.')
}

Output:

johnsmith@gmail.com
username@gmail.com
random@outlook.com

Regex details for the domain part:

\         Match the character “ ” literally
[^\.]     Match any character that is NOT a “A . character”
   +      Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\         Match the character “ ” literally
Theo
  • 57,719
  • 8
  • 24
  • 41
0

A PowerShell v6.1+ solution, which uses the ability of the -replace operator to accept a script block ({ ... }) to process each match.

For a solution that also works in Windows PowerShell, see Theo's helpful answer.

# Simulate an array of input lines.
$emails = @'
johnsmith at gmail dot com
username at gmail.com
random atsign outlook dot com
'@ -split '\r?\n'

# Synthesize a valid email address from each line.
# (If the lines came from file, say, 'emails.txt', replace `$emails`
#  with `(Get-Content emails.txt)`)
$emails -replace '^([^ ]+) \w+ ([^ ]+|[^ ]+ [^ ]+ [^ ]+)$',
  { '{0}@{1}' -f $_.Groups[1].Value, ($_.Groups[2].Value -replace ' [^ ]+ ', '.') }

Note:

  • I've assumed that the tokens in your input line are separated by exactly one space char.; to support multiple spaces as well, replace   in the regex with \s+.

  • [^ ]+ is a nonempty (+) run of non-space ([^ ]) characters; loosely speaking, a word.

  • The regex matches each line in full, capturing the parts of interest via capture groups ((...))

  • The script block ({ ... }) receives the match at hand in automatic variable $_, as a Match instance, from which the capture groups can be extracted via .Groups[<n>].Value), starting with index 1.

The above yields:

johnsmith@gmail.com
username@gmail.com
random@outlook.com
mklement0
  • 382,024
  • 64
  • 607
  • 775