8

With regex how can a match everything in a string that isnt something? This may not make sense but read on.

So take the word baby for instance to match everything that isn't a b you would do something like [^b] and this would match a and y. Simple enough! But how in this string Ben sits on a bench can I match everything that isn't ben so i would be attempting to match sits on a ch?

Better yet match everything that isn't a pattern? e.g. in 1a2be3 match everything that isn't number,letter,number, so it would match every combination in the string except 1a2?

Donal Fellows
  • 133,037
  • 18
  • 149
  • 215
Srb1313711
  • 2,017
  • 5
  • 24
  • 35
  • 6
    It sounds like you can just do a regex replace of your blacklisted pattern with the empty string and see if anything remains? – Jon Dec 10 '13 at 10:56
  • Can you give answer with an example? – Srb1313711 Dec 10 '13 at 11:00
  • 1
    @Srb1313711 Any programming language of choice? I'm not sure whether replacing can be done in a regex alone. – skiwi Dec 10 '13 at 11:02
  • 1
    I don't think there is much to add to what I already wrote... you would replace e.g. `\d[a-z]\d` (number-letter-number) with the empty string and then check if the result is non-empty. If it isn't then you have a "match". Alternatively you could *split* on that regex, so the input `xxx1a2yyy` would result in two tokens `xxx` and `yyy`. – Jon Dec 10 '13 at 11:03
  • @skiwi no language of choice I am aware some features of regex are more or less in some languages than in others but for the sake of the question no, any language solution. – Srb1313711 Dec 10 '13 at 11:06
  • 2
    Could this be a dupe of [Regular Expressions and negating a whole character group](http://stackoverflow.com/q/977251/758831)? – wmorrison365 Dec 10 '13 at 11:31
  • What do you plan to do with this? Are you looking to get the ranges, or the string, or to embed in a larger regexp, or what? – Donal Fellows Dec 17 '13 at 10:09

6 Answers6

1
(?:ben)|(.)

What this regex does is match ben or any other character, however, ben isn't captured but the other characters are. So you'll end up with a lot of matches except for the ben's. Then you can join all those matches together to get the string without the ben's.

Here an example in python.

import re

thestr = "Ben sits on a bench"
regex = r'(?:ben)|(.)'

matches = re.findall(regex, thestr, re.IGNORECASE)
print ''.join(matches)

This will ouput:

 sits on a ch

Note the leading space. You can of course get rid of that by adding .strip().

Also note, that it is probably faster to do a regex that replaces ben with an empty string to get the same result. But if you want to use this technique in a more complex regex it could come in handy.

And of course you can also put more complex regexes at the place of ben, so for example your number,letter,number example would be:

(?:[0-9][a-z][0-9])|(.)
gitaarik
  • 42,736
  • 12
  • 98
  • 105
1

Short answer: You can't do what you're asking. Technically, the first part has an ugly answer, but the second part (as I understand it) has no answer.


For your first part, I have a pretty impractical (yet pure regex) answer; anything better would require code (like @rednaw's much cleaner answer above). I added to the test to make it more comprehensive. (For simplicity, I'm using grep -Pio for PCRE, case insensitive, printing one match per line.)

$ echo "Ben sits on a bench better end" \
    |grep -Pio '(?=b(?!en)|(?<!b)en|e(?!n)|(?<!be)n|[^ben])\w+'
sits
on
a
ch
better
end

I'm basically making a special case for any letter in "ben" so I can include only iterations that are not themselves part of the string "ben." As I said, not really practical, even if I am technically answering your question. I've also saved a blow-by-blow explanation of this regex if you want further detail.

If you're forced into using a pure regex rather than code, your best bet for items like this is to write code to generate the regex. That way you can keep a clean copy of it.


I'm not sure what you're asking for the remainder of your challenge; a regex is either greedy or lazy [1] [2], and I don't know of any implementations that can find "every combination" rather than merely the first combination by either method. If there were such a thing, it would be very very slow in real life (rather than quick examples); the slow speed of regex engines would be intolerable if they were forced to examine every possibility, which would basically be a ReDoS.

Examples:

# greedy evaluation (default)
$ echo 1a2be3 |grep -Pio '(?!\d[a-z]\d)\w+'
a2be3

# lazy evaluation
$ echo 1a2be3 |grep -Pio '(?!\d[a-z]\d)\w+?'
a
2
b
e
3

I assume you are looking for 1 1a a a2 a2b a2be a2be3 2 2b 2be 2be3 b be be3 e e3 3 but I don't think you can get that with a pure regex. You'd need some code to generate every substring and then you could use a regex to filter out the forbidden pattern (again, this is all about greedy vs lazy vs ReDoS).

Adam Katz
  • 14,455
  • 5
  • 68
  • 83
  • +1 Thank you for an very detailed response this cleary took time to write and althoug couldnt answer the question was still very helpful. – Srb1313711 Jan 28 '14 at 09:42
0

If you want to match all the words except one, you can use negative lookahead: \b(?!ben\b)\w*\b, but for an answer to your exact question Jon's comment seems the simplest.

rzymek
  • 9,064
  • 2
  • 45
  • 59
hillel
  • 2,343
  • 2
  • 18
  • 25
  • This didnt work for me I tested here http://gskinner.com/RegExr/ with my ben example and it only matched the first ben? Also can you explain the \b? – Srb1313711 Dec 10 '13 at 11:11
  • \b is word boundary, try here: http://regexpal.com/ works fine for me (although not exactly what you requested, since it matches words). – hillel Dec 10 '13 at 11:21
0

Okay The simplest thing To Do is Match Everything

(.*?)

Then on the pattern matched do another Match for What you don't want(for e.g In perl you will have the pattern matched in the variable $&).

If it matches, That's not what you want else you have your match.

Simple A-B where A is everything(.*?) and B is What you don't want.So you end up doing two matches but i think that's fine.

Ronin
  • 33
  • 3
0

Just replace everything that matches your pattern with a blank (to delete it).

You haven't indicated what language you are using, so genetically:

s/ben//g

and your other example:

s/\d[a-zA-Z]\d//g
Bohemian
  • 412,405
  • 93
  • 575
  • 722
0

If you want list of strings, use "split on regexp" instead of "match on regexp".

Yuriy Kovalev
  • 639
  • 5
  • 9