3

I want to search two(or more) words(Or you can say group) in a given file type inside any given project/workspace.

I need an Efficient regular expression for the following queries(In Multi line mode) to use it eclipse File Search dialog(See below image. You can open search dialog by pressing Ctrl+H)

enter image description here

  1. Word 1 comes first then Word 2 comes
  2. Word 1 present But Word 2 MUST NOT present ANYWHERE.
  3. Both Word 1 and Word 2 MUST present ANYWHERE in the file(ORDER does't matter i.e Group union)
  4. Word 1 MUST NOT present ANYWHERE in the file.(i.e group negation)
  5. Word 1 Present OR Word 2 Present ANYWHERE in a file(ORDER doesn't matter)

Edit

I got for 1st

  1. (?m)(?s)(Word 1).*(Word 2)

But not for others.

Kara
  • 6,115
  • 16
  • 50
  • 57
Chandrayya G K
  • 8,719
  • 5
  • 40
  • 68
  • 4
    Post at least one attempt for each seperatly – Sully Feb 05 '14 at 12:30
  • 2
    This questions looks more like a requirements doc. – anubhava Feb 05 '14 at 12:31
  • You said _efficient_? Using greedy `.*` certainly doesn't qualify. Think what happens when the second required word immediately follows the first word... This should all be very easy with minimal googling, the only minor trick will be on the negation - this requires a negative lookahead. – Boris the Spider Feb 05 '14 at 12:37
  • Is this theoretical regular expression, or are you allowed to use advanced features in Java (ir)regular expression? – nhahtdh Feb 05 '14 at 12:38
  • Basically I need this searching java files inside eclipse. i.e in **File Search dialog**. – Chandrayya G K Feb 05 '14 at 12:40

1 Answers1

5

When you want to make the search efficient you have to be aware of the different goals: The Eclipse Search function looks for all occurrences but you want to check for the presence of a word only.

For a single word you can just search for word but since you want to search for combinations using unbounded quantifiers this does not perform very well.

So the first thing you have to do is to stop Eclipse (the regex engine) from checking for a match at every character position of a file by adding the anchor \A which stands for “beginning of the file”. Then skip as little characters as possible and search for a literal word match to check for the presence:

(?s)\A.*?word will search for the first occurrence of word but not for any further.

Expanding it to check for two words in order is easy:

(1) (?s)\A.*?word1.*?word2 Just checks for one occurrence of each word in order but nothing more.

For checking of the presence or absence without an order you can use a look-ahead:

(2) (?s)\A(?!.*?word2).*?word1 Simply negate the look-ahead to tell that word2 must not be present…

(3) (?s)\A(?=.*?word1).*?word2 If one match for word1 is present find one match for word2; of course, word1 and word2 are interchangeable.

(4) (?s)\A(?!.*?word1).? and just use the negative look-ahead to search for the absence of word1 only; if absent .? just matches a single optional character as an empty regex won’t match anything in DOTALL mode.

(5) (?s)\A.*?(word1|word2) Telling that either word1 or word2 satisfies is straight-forward.

Of course, if you are looking for whole words, the word placeholders above have to be replaced by \bactualwordcharacters\b.

Chandrayya G K
  • 8,719
  • 5
  • 40
  • 68
Holger
  • 285,553
  • 42
  • 434
  • 765
  • Thanks for your answer. Its has very good explanation. I edited my question and numbered my queries. In order to help others please edit your answer and mark answers for queries with numbers. – Chandrayya G K Feb 06 '14 at 13:24
  • @Chandrayya G K: did it – Holger Feb 06 '14 at 13:48
  • Thank you very much. Tested all the points working fine. **FYI** edited your answer slightly and re-ordered the points to make it sequence. – Chandrayya G K Feb 07 '14 at 05:57