How to transform strings based on these conditions? (no-spam email to real e-mail)

Question

I have a list of email, for example

johnsmith at gmail dot com
username at gmail.com
random atsign outlook dot com

The username and the provider is always separated by a custom word between spaces. The problem here is that the domain can have a custom separator like this (dot, or any text) OR just a dot, like gmail.com If it would have only spaces, I would simply read the lines and split them at the spaces, then write the first, @, the third, . and then the fifth items from the list. However, the possible john at gmail.com format is problematic for me. How could I handle this format along with the simple name at gmail dot com formats in one script?

[1] what have you tried so far & how did it fail to do what you need? [2] please provide at least one sample data point for your various possible inputs. [3] please provide the desired output for each sample input & the logic to get from input to output. — Lee_Dailey, Dec 08 '19 at 15:32

score 1 · Accepted Answer · answered Dec 08 '19 at 15:36

For the examples you give, a bit of regex will do it:

$emails = @"
johnsmith at gmail dot com
username at gmail.com
random atsign outlook dot com
"@ -split '\r?\n'

$emails | ForEach-Object {
    # replace all repeating whitespace characters by a single space
    # and split 3 parts 
    $pieces = $_ -replace '\s+', ' ' -split ' ', 3
    # output the username, followed by the '@' sign, followed by the domain
    '{0}@{1}' -f $pieces[0], ($pieces[2] -replace ' [^\.]+ ', '.')
}

Output:

johnsmith@gmail.com
username@gmail.com
random@outlook.com

Regex details for the domain part:

\         Match the character “ ” literally
[^\.]     Match any character that is NOT a “A . character”
   +      Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\         Match the character “ ” literally

mklement0 · Answer 2 · 2019-12-08T15:56:58.040

A PowerShell v6.1+ solution, which uses the ability of the -replace operator to accept a script block ({ ... }) to process each match.

^{For a solution that also works in Windows PowerShell, see Theo's helpful answer.}

# Simulate an array of input lines.
$emails = @'
johnsmith at gmail dot com
username at gmail.com
random atsign outlook dot com
'@ -split '\r?\n'

# Synthesize a valid email address from each line.
# (If the lines came from file, say, 'emails.txt', replace `$emails`
#  with `(Get-Content emails.txt)`)
$emails -replace '^([^ ]+) \w+ ([^ ]+|[^ ]+ [^ ]+ [^ ]+)$',
  { '{0}@{1}' -f $_.Groups[1].Value, ($_.Groups[2].Value -replace ' [^ ]+ ', '.') }

Note:

I've assumed that the tokens in your input line are separated by exactly one space char.; to support multiple spaces as well, replace in the regex with \s+.
[^ ]+ is a nonempty (+) run of non-space ([^ ]) characters; loosely speaking, a word.
The regex matches each line in full, capturing the parts of interest via capture groups ((...))
The script block ({ ... }) receives the match at hand in automatic variable $_, as a Match instance, from which the capture groups can be extracted via .Groups[<n>].Value), starting with index 1.

The above yields:

johnsmith@gmail.com
username@gmail.com
random@outlook.com

How to transform strings based on these conditions? (no-spam email to real e-mail)

2 Answers2