4

I have a old table that has applicant info including the person who interviewed them Name|Phone|Email|Addr|Interv|IntervEmail

unfortunately the interviewer emails are a mess. Some blank some have phone numbers some have spaces or other special characters like this.is.a.mess@anydamnplace.com / 999-222-5555 home or home 999-222-5555, this.is.a.mess@anydamnplace.com

What I would like is be able to get all rows with interviewer emails from the table but only the email and not the other junk.

any help appreciated

example From this

user   | home_phone | applicant_email           | addr      | intrvwr_nm | intrvwr_email
-------+------------+---------------------------+-----------+------------+---------------------------------------------------------------------
Jane   | 1234567890 | someemail@anydamplace.com | 123 here  | Jeff       | jeff@surething.net / 321589364
Larry  | 1234567890 | someemail@anydamplace.com | 123 there | Eddie      | gjfgn@anotherplace.net / 2321.654.9872
harry  | 1234567890 | someemail@anydamplace.com | 456 here  | Murray     | Admissions Field Representative/murray@anotherplace.com/212-222-1111
Carrie | 1234567890 | someemail@anydamplace.com | 456 there | Del        | dely@school.edu
Joe    | 1234567890 | someemail@anydamplace.com | 123 here  | James      | 1234567890/sometext, this.is.a.mess@anydamnplace.com

To this

user   | home_phone | applicant_email           | addr      | intrvwr_nm | intrvwr_email
-------+------------+---------------------------+-----------+------------+---------------------------------------------------------------------
Jane   | 1234567890 | someemail@anydamplace.com | 123 here  | Jeff       | jeff@surething.net
Larry  | 1234567890 | someemail@anydamplace.com | 123 there | Eddie      | gjfgn@anotherplace.net
harry  | 1234567890 | someemail@anydamplace.com | 456 here  | Murray     | murray@anotherplace.com
Carrie | 1234567890 | someemail@anydamplace.com | 456 there | Del        | dely@school.edu
Joe    | 1234567890 | someemail@anydamplace.com | 123 here  | James      | this.is.a.mess@anydamnplace.com
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
Variant
  • 51
  • 6
  • 1
    Is this a one-time thing? How many rows are you looking at? This might be something more well suited for a platform that supports regular expressions. – squillman Feb 21 '20 at 20:12
  • Yes most likely one time deal so we can copy clean data to new table. it has 1981 rows atm – Variant Feb 21 '20 at 20:16
  • however, we might need it for other tables. It would probably be easier to put it in an excel sheet and do a regexp (which I would not know how to either now) but we might need something similar for all the old tables – Variant Feb 21 '20 at 20:22
  • 3
    Can you provide a [mcve]? – Sean Lange Feb 21 '20 at 20:25
  • 1
    Do all of the email addresses in the garbage rows have a delimiting character (such as a space) on either end of the address? If so, something in t-sql could probably be created fairly easily. If not, not even sure regexp could take care of it... – squillman Feb 21 '20 at 20:28
  • I added an example of what it looks like now and what we would like the output to be – Variant Feb 21 '20 at 20:50
  • 3
    Given the total trainwreck of that data and that you only have 1,981 rows I would just manually edit this. Any query you write is going to be fraught with issues because that data is a mess. There are just too many variables with this type of data. – Sean Lange Feb 21 '20 at 20:50
  • Agreed. Just from the sample data posted you have no way of programatically consistently distinguishing an email address. – squillman Feb 21 '20 at 20:54
  • 1
    I agree with Sean Lange here. According to https://stackoverflow.com/questions/2049502/what-characters-are-allowed-in-an-email-address a slash is allowed in an email address, so it needs common sense to tell whether the email address is `'Representative/murray@anotherplace.com'` or just `'murray@anotherplace.com'`. – Thorsten Kettner Feb 21 '20 at 21:08
  • Old data slashes most likely not an issue – Variant Feb 21 '20 at 21:35
  • 1
    What's your timeline? This looks like an _awesome_ project for a summer intern.... – Eric Brandt Feb 21 '20 at 21:54
  • 2
    Suggest you read about this madness first - [the 100% way to validate](https://medium.com/hackernoon/the-100-correct-way-to-validate-email-addresses-7c4818f24643) – SMor Feb 21 '20 at 22:14

1 Answers1

3

Try running this query. It would catch most of the valid email address:

SELECT a.email 
FROM
(
SELECT value as email  
FROM applicant_Table  
    CROSS APPLY STRING_SPLIT(intrvwr_email, ' ') 
) as a
where a.email like '%_@__%.__%'

Please take into account that:

  1. This only works for SQL Server 2016 or above.
  2. The delimiter is a space.

You can add more columns to the query (like the row ID) so you get to know in which row is stored every email address you're getting.

  • that returns rows no emails at all. and SELECT * FROM applicants WHERE email LIKE '%_@__%.__%' returns the other junk with the emai I just need the rows with an email and the cell should only have the email itself not the phone or other stuff – Variant Feb 21 '20 at 20:19
  • that actually worked good except when there was no space between the email and junk returned emailherer@yahoo.com/(123) instead of emailherer@yahoo.com/(123) 456789 – Variant Feb 21 '20 at 21:07
  • 2
    @CrashWave Just a minor twist on the answer above (+1) https://dbfiddle.uk/?rdbms=sqlserver_2016&fiddle=c81128c5f708f20ca6a702d6a048b9aa – John Cappelletti Feb 21 '20 at 21:59
  • 1
    @john-cappelletti Your minor twist soled our problem. THANK YOU – Variant Feb 27 '20 at 14:09