2

Ok, so I have a phrase "foo bar" and I want to find everything BUT "foo bar".
Here's my text.

ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar

There's a way to do this just within regex right? I don't have to go and use strings etc. do I?

RESULT:

NOTE I can't do a nice highlighting but the bold gives you an idea (although the spaces that are before and after would also be selected but it breaks the bolding).

ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar

Assume PCRE nomenclature.


UPDATE 7/29/2013: it may be better to use a search and replace function in your language of choice to just 'remove' the phrases that you don't want so that you are then left with the info you do want.

Keng
  • 52,011
  • 32
  • 81
  • 111
  • Tell us what operating system. Tell us what programming language. Tell us the EXACT STRING you are starting with. Tell us the EXACT STRING you expect as an answer. Tell us what you are not allowed to use AND WHY. – tchrist Nov 17 '10 at 20:28
  • 2
    See question for 'exact string'; see question for expected result. I'm not allowed to use walruses, the colour blue, or cheeky phrases concerning Vogon poetry. I am allowed to use regex (with PCRE rules)....Why....who knows...but my guess has something to do with the walrus...or possible his hat. – Keng Nov 17 '10 at 20:40

6 Answers6

9

In general, if foobar matches itself, then (?s:(?!foobar).)* matches anything that is not foobar, including nothing at all.

You could use that to find lines that don’t have foobar in them, for example, using

^(?:(?!foobar).)*$

You could also use your language’s split() function to split on foobar, which will give you all the pieces that do not include the split pattern.

Regarding the nasty little-known backtracking control verbs like (*FAIL) and (*COMMIT), I haven’t yet had much occasion to use them in ‘non-toy’ programs. I find that independent subexpressions via (?>...) and the possessive quantifiers *+, ++, ?+ etc. give me more than enough rope, so to speak.

That said, I do have one toy example of using (*FAIL) in this answer; it’s the very first regex solution. The reason for its being there was I wanted to force the regex engine to backtrack through all possible permutations; the real goal was merely to count how many ways it tried things.

Please understand that my two regexes there, along with the many, many incredibly creative answers from others, are all meant to be fun, tongue-in-cheek things. Still, one can learn a lot from them — once one recovers from shock. ☺

Community
  • 1
  • 1
tchrist
  • 78,834
  • 30
  • 123
  • 180
  • @tchrist wait...isn't this searching for "foobar" and not "foo bar"? which may be why it's not matching in UltraEdit correctly. – Keng Nov 09 '10 at 03:27
  • @Kang: I don't know why he dropped the space, but you can put it back in; it will still work. I don't think he meant to imply that `foobar` would match `foo bar`. :P (Then again, in Perl 6 it *would* match, wouldn't it?) – Alan Moore Nov 16 '10 at 21:20
  • @Alan Moore and others: This does NOT work. It does not work as is or with the extra space inside the 'foobar' phrase. – Keng Nov 17 '10 at 15:07
  • @tchrist please see Alan and my response above. – Keng Nov 17 '10 at 15:07
  • @Keng: Then I have no idea what you actually want. Please supply the input string and the desired output. I cannot tell whether you’re talking about wanting the entire match to fail, or if you want something not to be included in a successful match. Can't guess, so tell me. – tchrist Nov 17 '10 at 19:09
  • @Keng: Barely. How does `@pieces = split /UNWANTED/, $FULL_TEXT` **not** solve your problem? I mentioned it early on. – tchrist Nov 17 '10 at 19:55
  • @tchrist oh...yeah, that would work but because I'm severely restricted in my environment, I can't use that method. 8.0( – Keng Nov 17 '10 at 20:00
  • @Keng: Then no one can help you because you have failed to adequately explain what constraints you must work under. – tchrist Nov 17 '10 at 20:06
  • @tchrist Well, I'm still looking for a regex pattern to do this regardless of whether I can or can't use C# C++ or PHP. Regex can find everything but a word; I'm looking for everything but a phrase. A phrase is after all just a word with an odd character somewhere in the middle that makes it look like two words to humans. – Keng Nov 17 '10 at 20:25
  • @tchrist why in the world do you think that I'm using a 'language'? Why do you think I'm even in a programming language? There no 'split' in programming text editors. – Keng Nov 17 '10 at 20:32
  • @Keng of course there is. `:.,.+3!perl -ne 'print join " ", reverse split /foobar/'` – tchrist Nov 17 '10 at 20:40
  • @tchrist and what language is `:.,.+3!perl -ne 'print join " ", reverse split /foobar/'` from? – Keng Nov 17 '10 at 20:44
  • @Keng: It's not. Remember, you said we weren't allowed to use language. It is, of course, an editor command. – tchrist Nov 17 '10 at 22:14
4

try

^(?!.*foo bar).*$

this should select every line that does not contain "foo bar". (?! = negative lookahead)

Stephan Schinkel
  • 5,270
  • 1
  • 25
  • 42
  • It's not pulling using UltaEdit. I'm not able to test it with RegexBuddy at the time but I might be able to try it tonight. – Keng Nov 16 '10 at 16:33
  • it's not pulling with REB either. any ideas why? – Keng Nov 17 '10 at 03:52
  • the regular expression option multiline must be enabled for this to work. the tool i test with is: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx – Stephan Schinkel Nov 17 '10 at 12:29
  • I'm confused - this won't actually produce the response specified in the question. – SamStephens Nov 22 '10 at 04:24
2

"remove everything except foo bar" is equivalent to "find only foo bar", which PCRE allows quite easily. Conversely, "find everything except foo bar" is equivalent to "find and remove only foo bar". So, complementation is easily done from your tools.

Aside from that, PCRE has a nasty little feature known as *FAIL which immediately causes a backtrack when it's encountered. So, I suppose inserting something like (*COMMIT)foo bar(*FAIL) into your regular expression could help. It's neither friendly nor very safe, though.

Victor Nicollet
  • 24,361
  • 4
  • 58
  • 89
  • sorry, my ham-handed example lead both you and me down the wrong path. I edited it to make it clearer....I hope. – Keng Nov 05 '10 at 19:28
1

to show everything except "foo bar" and "fad bad" this worked for me:

^(?!.*foo bar)(?!.*fad bad).*$

Keng
  • 52,011
  • 32
  • 81
  • 111
Adam Johns
  • 35,397
  • 25
  • 123
  • 176
1

Okay, so you want to remove everything except foo bar using UltraEdit's "Advanced" (Perl-regex style) search feature. The easiest way to do that is to match everything, but only capture foo bar, like this:

(?:(?!foo bar).)+(foo bar|$)

...and replace it with $1 or \1 (whichever style UltraEdit accepts).

I don't use UltraEdit, but in EditPadPro it converts this:

ipsum dolor foo bar Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do
eiusmod tempor foo bar incididunt ut labore et
dolore foo bar 

...to this:

foo bar

foo bar
foo bar

...which is the result you showed in your original message.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
1

Here: perl -pe 's{.*?(foo bar)?}{$1}g' <text

I want to find everything BUT "foo bar"

A match-only pattern without using substitution by $1 (that is usable with the empty replacement as in s{pattern}{})... not sure that is possible. You would have to gobble up chars up until foo bar, e.g. with .*?(?=foo bar). But then the matching algorithm continues on and sees "oo bar", and would match again as there is no f.

Continuing the quest, here is a piece of perl code that gobbles up the requested parts, only with the drawback that empty captures may be returned if foo bar happens to be at the start of the line:

foreach (<>) {
        chomp;
        @_ = m{(.*?)(?:foo bar|$)}gs;
        print "[[ $_ ]]\n" for @_;
}

There is no substituion involved and running this on the Lorem ipsum test file will show everything but the foo bar parts. It is PCRE compatible, but there is no guarantees that $EDITOR will does what you envision.

user502515
  • 4,346
  • 24
  • 20