0

I'm trying to extract an email address from a string. Most of the entries have a similar structure, for example:

From: John Doe Sent:Monday, October 10, 2013 11:59 AM
To: '**xyz123@yahoo.com**'Cc:mno456@comcast.net (the lack of spaces is intentional)

On the other hand, some have a different format, i.e. don't include the Cc:, etc. The goal is to extract the To: email address, i.e. xyz123@yahoo.com in our example.

I've tried using INSTR and SUBSTR but have not had much luck. Is there any way to extract this email address using RegEx's, or does anyone have any other ideas.

ataravati
  • 8,891
  • 9
  • 57
  • 89
Rikin
  • 283
  • 3
  • 13
  • 29
  • See [Using a regular expression to validate an email address](http://stackoverflow.com/q/201323/427192). Not a duplicate, but possibly useful – Dan Pichelman Nov 21 '13 at 21:35
  • possible duplicate of [Regular Expression in Oracle with REGEXP\_SUBSTR](http://stackoverflow.com/questions/18694152/regular-expression-in-oracle-with-regexp-substr) – John Doyle Nov 22 '13 at 22:47

2 Answers2

1

Try this:

SELECT Trim( both '*' from Trim( both '''' from
       Substr( eml,
          instr( eml, 'To:' ) + 4,
          CASE WHEN instr( eml, 'Cc:' ) > 0
          THEN instr( eml, 'Cc:' ) - ( instr( eml, 'To:' ) + 4 )
          ELSE 1000 END
       )))
FROM emails;

demo --> http://www.sqlfiddle.com/#!4/5f181/5

krokodilko
  • 35,300
  • 7
  • 55
  • 79
0

It would be of this form:

select
     Ltrim(regexp_substr(column_name,'To:''[[:alpha:]][[:alnum:]]*.?[[:alnum:]]*@[[:alpha:]][[:alnum:]]*[.][[:alpha:]]+''',1,1),'To:')
from
     table_name;

Of course assumptions are being made here (e.g. only one addressee). One would need to use the alternation operator to cover other forms that one could encounter.

With respect to my regular expression:

alphabetic character class + alphanumeric character class (zero or more) + a period (zero or 1) + alphanumeric character class (zero or more) @ + alphabetic character class + alphanumeric character class (zero or more)

Patrick Bacon
  • 4,490
  • 1
  • 26
  • 33
  • Not only will this not compile correctly (you need extra square brackets around POSIX character classes) it's really quite wrong, although the idea of using multiple character classes is the way to go as an e-mail can't start with a non-alphanumeric character according to the RFC (I think and unless it's quoted). – Ben Nov 21 '13 at 23:55
  • Point taken. I did test against a simple literal and it did work. Truly one would want to cover other possible scenarios using the alternation operator. – Patrick Bacon Nov 22 '13 at 12:15