-3

I have a string of emails like below

"test@test.comtest.test1@test.comtest@yahoo.co.intest1.test2@support.yahoo.com"

I want to convert this to an array of valid email addresses. I've been trying to solve this by using regex.

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
  • Well you could first split those emails and then loop throu them with the validation. –  May 30 '12 at 13:01
  • "trying to solve this by using regex." What have you tried? – kevlar1818 May 30 '12 at 13:01
  • I think that just for clarity (especially for what seems to be the third/fourth email), I think you should post what sort of list you are auctally expecting. – npinti May 30 '12 at 13:01
  • I think regex isn't powerful enough to guess where to split. Wait for the next Google AI `:)` – sp00m May 30 '12 at 13:04
  • I agree, I believe there is an error in that string to parse. Are the emails separated by periods or not? – kevlar1818 May 30 '12 at 13:05
  • 2
    Only a Regex won't help here. How do you decide whether it's ".com", "test" or ".co", "mtest" etc? If you have no clear delimiters to work with at all, it very much depends on the actual data whether it's salvageable automatically or not. – deceze May 30 '12 at 13:05
  • @sp00m The power of RegEx lies in heavy use of backtracking. If backtracking can't solve it it doesn't mean that RegEx isn't powerful enough, it's just a typical PEBKAC problem which can be solved with a different tool. – Mihai Stancu May 30 '12 at 13:17
  • You can't parse that unless you make some assumptions about the domain names. There are very likely some ambiguous combinations possible. You should kick it back to whoever supplied that list. – Hot Licks Jun 05 '13 at 02:14

2 Answers2

2

To sum up what everyone's been commenting,

You really need to delimit your data better. For example you might do:

test@test.com;test.test1@test.com;test@yahoo.co.in;test1.test2@support.yahoo.com

Doing this would let you split your answer on ; to get a list of possible email addresses. However, look at this this SO accepted answer about the problem with validating email addresses using regex. There's so many formats and possibilities for email addresses that they are hard to validate with just a regex.

Here is an example of delimiting using the above string.

Community
  • 1
  • 1
kevlar1818
  • 3,055
  • 6
  • 29
  • 43
  • 1
    I think that this should at most be a comment, not an actual answer. – npinti May 30 '12 at 13:10
  • @npinti This solves the problem in (IMO) the best way. How is that not worthy of an answer? Also, I'm going to add a jsfiddle in a moment to clarify the process. – kevlar1818 May 30 '12 at 13:12
  • Because it does not answer the questions, it assumes extra conditions and the answers a different question that does respect those extra conditions. You should've asked the OP to clarify and if the case was favorable for your answer, then you could post the answer. – Mihai Stancu May 30 '12 at 13:19
  • 1
    The OP was not responding to anyone's comments, so I figured I'd post this. – kevlar1818 May 30 '12 at 13:22
0

You may be able to do this if you can guarantee that:

  1. all emails start with "test" or some other known string, or
  2. all the possible domains in your data set are known.

If you can make some guarantees, then you can do something like this in Ruby:

emails = "test@test.comtest.test1@test.comtest@yahoo.co.intest1.test2@support.yahoo.com"

# Test for a known string ending in a known domain.
emails.scan /(test.*?[.](?:com|in))/

# Test for known domains with positive lookbehind.
emails.scan /(?<=^|com|in).*?(?:com|in)/

In other words, if it's fixture data, fix your fixtures to have a sensible delimiter. That will take less time and be less error-prone.

On the other hand, if it's real data then it's unlikely you can separate them. Distinguishing an arbitrary domain name from an arbitrary trailing mailbox name is impractical.

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199