How to standardize a column to be able to parse into multiple columns?

Question

There are a lot of posts on how to split a single column value into multiple columns, however I am having trouble standardizing data in the column to be able to use those techniques. What is the best approach for standardizing the below example?

For example:

**emails**
first_email,second_email
third_email; fourth_email
fifth_email ;sixth_email
seventh_email, eight_email
ninth_email.tenth_email

My expected output from standardization is:

**emails**
first_email, second_email
third_email, fourth_email
fifth_email, sixth_email
seventh_email, eight_email
ninth_email, tenth_email

If possible, I would highly suggest changing your database design to store only a single email in each row. Another issue is that periods are allowed characters in email addresses. So splitting on them wouldn't necessarily produce your expected results. — sgeddes, Aug 10 '18 at 17:03
Sadly, this is data from a database that I do not have control over and so have to restructure to fit into a better database design. — Jen, Aug 10 '18 at 17:06
A comma or semicolon are valid separators for multiple email, but a period is not (since it is commonly used as `first.last@domain.com`). Is `.` really a separator between two email addresses?, or is `.` part of a valid email address? — James L., Aug 10 '18 at 17:11
I would suggest using regular expressions to parse the list. I doubt it's possible without regexp. — Ildar Akhmetov, Aug 10 '18 at 17:12
Thanks, and yeah there are some emails with . as a separator. — Jen, Aug 10 '18 at 17:13
In your sample above, are there only two emails per row in the single column? Or are all 10 in the same tuple (“cell”)? — Stuart Ainsworth, Aug 10 '18 at 17:14
Are there other email where the `.` is part of the email address, or are all `.` used to separate two email addresses? — James L., Aug 10 '18 at 17:16
There are up to 5 emails per row in the column, they are not in one cell. There are other emails where the `.` is part of the email address. — Jen, Aug 10 '18 at 17:19
How many records are we talking about? Can you manually sort through a few stragglers if necessary? The `,` and `;` are no problem, the `.` on the other hand will cause you a lot of grief. — Dave C, Aug 10 '18 at 17:21
If the `.` wasn't used to sometimes be part of an email and other times separate two email, it would be easy. I'll post an example. — James L., Aug 10 '18 at 17:25
That might have to be the approach there are around 26 thousand records. I am probably going to have to manually sort through the stragglers. — Jen, Aug 10 '18 at 17:26
If at least there was space after the separating `.`! But it should be easy to find those with `email LIKE '%@%@%'`. This finds records with 2 or more `@` once you have separated the begnin cases. — Olivier Jacot-Descombes, Aug 10 '18 at 17:26

score 1 · Accepted Answer · answered Aug 10 '18 at 17:28

This shows how you can normalize the formatting, but because the . character is sometimes used in an email address and other times used to separate email addresses, you should probably not include the replace for the ., and just work through those manually.

You can used a nested replace() to eliminate the spaces first, then replace the valid email address separators with ,[space].

declare @str varchar(max) = 'first_email,second_email
third_email; fourth_email
fifth_email ;sixth_email
seventh_email, eight_email
ninth_email.tenth_email'

select replace(replace(replace(replace(
         @str, ' ',   ''), -- a space is not valid in an email
               ',', ', '),
               ';', ', '),
               '.', ', '
       )

Which returns:

first_email, second_email
third_email, fourth_email
fifth_email, sixth_email
seventh_email, eight_email
ninth_email, tenth_email

Replacing `'.', ', '` will not work with `fn.ln@provider.com`. — Olivier Jacot-Descombes, Aug 10 '18 at 17:32
Yes, that is why I said she should **not** include the replace for `.`. She should really clean those records up first and exclude the `'.', ', '` from the the final script. — James L., Aug 10 '18 at 17:49

How to standardize a column to be able to parse into multiple columns?

1 Answers1