1

So, I have rows of text with plenty unnecessary information in them (in a google sheet). I would like to match everything except the EMAIL for which I'm using the following regex:

[a-zA-Z0-9_.+-]+@(?:[a-zA-Z0-9-]+\.)+(?!png|jpg|gif)[a-zA-Z0-9-]+

If I can manage to match everything except the email, then I can just find/replace and leave only the email in the row which is what I want. Having some trouble here. Help would be appreciated!

  • Regex highly depends on flavor you are using. In which application/ programming language are you intend to use regex? Some regex flavor doesn't support negative lookbehind and so. – Saleem Feb 29 '16 at 12:36
  • I was going to use javascript, but for now just a simple Google sheet find/replace with the 'search using regular expressions' box checked. If you have other ideas - let me know - thanks! – user2448617 Feb 29 '16 at 23:59

2 Answers2

1

While it's not perfect this could be what you're after:

For the online demo this works: ^(?:.*?(\w[^@\s]*@[^@\s]{2,}).*?|.+)$ demo

However for Google Sheets you need to remove the ^ and $ line start/end markers and it should do most of what you want. So:

(?:.*?(\w[^@\s]*@[^@\s]{2,}).*?|.+)

replace this pattern with $1 to leave just the email address per line

This works per line, the pattern is made up of two patterns in a non-capturing group (?:. First pattern looks from the start of the line .*? to lazily match all characters up until group1 containing the email pattern (\w[^@\s]*@[^@\s]{2,}) followed by anything else .* till the end of the line. The second pattern will match all other lines without an email. This is the search pattern. The replace pattern is simply group1 $1. Group1 will be empty if no email address is found thus each line will either be blank or be populated with the email address.

This might not match all email addresses but should match most. See this question for a lengthy read about regex matching email addresses.

Community
  • 1
  • 1
snoopen
  • 225
  • 1
  • 9
  • Thanks for this, but as per my original request I'm looking to match everything except for email addresses so that when I can find/replace it is only the emails that are left over. Thanks for your help! – user2448617 Feb 29 '16 at 23:50
  • Did you check the demo? This does exactly what you're after. What it does is finds everything but marks the email as a group. When doing the replace it fills each line with group 1. Thus leaves only email addresses. – snoopen Mar 01 '16 at 00:19
  • Thanks for that, close! But this is what happens: [gif example of regex](http://goo.gl/ajBe6H) – user2448617 Mar 01 '16 at 11:45
  • Oh right. I hadn't considered how regex works in Google Sheets. You can try $1 instead of \1. I think my pattern might work after some tailoring – snoopen Mar 01 '16 at 11:55
  • Just did a quick test. `$1` as the replace value works but as I suspected ^ and $ line start/end markers are causing multi-line cells to not match – snoopen Mar 01 '16 at 12:02
  • I've updated the answer. It was actually as simple as removing the line start/end markers and using $1. The only thing you might want to revise is the email pattern `\w[^@\s]*@[^@\s]{2,}` as I had what look like a few false positives in my test. The linked question should help you with that though. – snoopen Mar 01 '16 at 12:09
  • Actually, is there anyway to replace everything but only keep one email? Finding a lot of fields with multi-line emails or joined emails and it's messing up a little. (i.e. returning results like so: " `thewodapalooza,nInfo@thewodapalooza.com,54402966,thewodapalooza,The"` – user2448617 Mar 01 '16 at 12:53
  • Yeah so turns out you can return just one email with `^(?:[\S\s]*?(\w[^@\s]*@[^@\s]{2,})[\S\s]*?|[\S\s]+)$`. For lines like your example you'll need to replace the email pattern. Try this one http://stackoverflow.com/a/8829363/1015849 – snoopen Mar 01 '16 at 21:19
0

You can't match everything but email. But you can match everything and email.

Match anything non-greedily followed by captured email or end of string. Change to the capture group, globally:

"BLAHBLAHemailBLAHBLAHemailBLAH".replace(/.*?(email|$)/g, "$1");
// => "emailemail"

(insert your own email regexp.)

Amadan
  • 191,408
  • 23
  • 240
  • 301