3

I am trying to search for lines that contain any permutation of a group of words (case-insensitively). For example, if I am interested in the words foo and bar, I would want to match the first four lines but not the last four lines in the following file:

Foo and bar.
Bar and foo.
The foo and the bar.
The bar and the foo.
Foobar.
Barfoo.
The foobar.
The barfoo.

Having looked at this post, I realize I can construct something like this in perl:

perl -n -e 'print if (/\bfoo\b.*?\bbar\b/i || /\bbar\b.*?\bfoo\b/i)' file

which correctly matches only the first four lines. Alternatively, using a look-ahead construct as suggested by this post, the match can be made with slightly more concise code:

perl -n -e 'print if (/(?=.*\bfoo\b)(?=.*\bbar\b)/i)' file

I cannot, however, figure out how to write these in vim regex syntax, which I find to be far more byzantine than perl regex syntax. I have tried many different expressions in vim using the search function (/ or ?), but none of them produce successful matches. I realize that instead of the (?=string) syntax used by perl, vim uses \(string\)\@= and string\&.

However, a variety of attempts, e.g.:

  • \c\(foo\)\@=\(bar\)@=
  • \c\(foo\)\@=\.*\(bar\)@=
  • \cfoo\&bar\&

(where \c is used for a case-insensitive match) have all been unsuccessful.

Could someone please demonstrate the correct vim syntax?

Community
  • 1
  • 1
user001
  • 1,850
  • 4
  • 27
  • 42

2 Answers2

5

Try: \c.*\<foo\>.*\&.*\<bar\>.*. This should match the whole of each of the first four lines.

You were closest with \c\(foo\)\@=\(bar\)@=, but since you don't want e.g. foobar, barfoo to match it's necessary to use begin/end of word matching: \<\>.

Using \& simplifies the pattern a bit.

If you don't need the whole line matches from that pattern, just a hit on any line that matches, you can simplify this regex a bit more by killing the trailing .* pieces in the pattern: \c.*\<foo\>\&.*\<bar\>

pb2q
  • 58,613
  • 19
  • 146
  • 147
2

Try the following:

/^\c\(.*\<foo\>\)\@=\(.*\<bar\>\)\@=/

This is the same thing as the lookahead version from Perl, \@= makes the previous element or group a positive lookahead. \< and \> are the vim equivalent to \b, and \c enables case insensitive matching. I added the ^ anchor so it will match each line only once.

Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • Thanks, the `^` anchor was very helpful because without it, each character in the line was counted as a unique match. Could you also tell me how to add an optional suffix? Say, I wanted to match barring and barred. I realize I could omit the word end boundary (`/^\c\(.*\\)\@=\(.*\`? – user001 Aug 07 '12 at 18:18
  • 2
    close, but more escaping: `\` – pb2q Aug 07 '12 at 18:39