48

In general terms I want to find in the string some substring but only if it is contained there.

I had expression :

^.*(\bpass\b)?.*$

And test string:

high pass h3 

When I test the string via expression I see that whole string is found (but group "pass" not):

match : true
groups count : 1  
group : high pass h3 

But that I needed, is that match has 2 groups : 1: high pass h3 2: pass

And when I test, for example, the string - high h3, I still had 1 group found - high h3

How can I do this?

Mat
  • 202,337
  • 40
  • 393
  • 406
baio
  • 1,102
  • 2
  • 12
  • 20
  • 1
    1. What platform (not all regex implementations are the same): Perl, Python, Java, .NET, ...? 2. "only if it is contained there" is not clear. – Richard Feb 19 '12 at 10:14
  • 1
    Why do you want the whole string as a match? – Mat Feb 19 '12 at 10:15
  • It could be multi line, getting him the complete lines including the word to be found. – Mario Feb 19 '12 at 10:20
  • Why I needed is because this is the only part of my regex expression and there is another patterns for seacrh which should work even if the "pass" not found. – baio Feb 19 '12 at 11:35

3 Answers3

97

Use this one:

^(.*?(\bpass\b)[^$]*)$
  1. First capture for the entire line.
  2. Second capture for the expected word.

Check the demo.

More explanation:

          ┌ first capture
          |
 ⧽------------------⧼
^(.*?(\bpass\b)[^$]*)$
  ⧽-⧼          ⧽---⧼
   | ⧽--------⧼  |
   |     |       └ all characters who are not the end of the string
   |     |
   |     └ second capture
   |
   └ optional begin characters
piouPiouM
  • 4,937
  • 1
  • 21
  • 22
  • Thansks! But problem is what I needed that the match (whole text) would be found even if "pass" isn't in the tested string, please see your demo. Is it possible? – baio Feb 19 '12 at 19:00
  • Check this one: http://www.myregextester.com/?r=aa94f52d `^(.*?(\bpass\b)[^$]*|[^$]*)$` – piouPiouM Feb 19 '12 at 19:21
  • How to deal with case sensitive problem? – Gem Jul 17 '19 at 16:14
7

You're just missing a bit for it to work (plus that ? is at the wrong position).

If you want to match the frist occurance: ^(.*?)(\bpass\b)(.*)$. If you want to match the last occurance: ^(.*)(\bpass\b)(.*?)$.

This will result in 3 capture groups: Everything before, the exact match and everything following.

. will match (depending on your settings almost) anything, but only a single character. ? will make the preceding element optional, i.e. appearing not at all or exactly once. * will match the preceding element multiple times, i.e. not at all or an unlimited amount of times. This will match as many characters as possible. If you combine both to *? you'll get a ungreedy match, essentially matching as few characters as possible (down to 0).

Edit: As I read you only want pass and the complete string, depending on your implementation/language, the following should be enough: ^.*(\bpass\b).*?$ (again, the ungreedy match might be swapped with the greedy one). You'll get the whole expression/match as group 0 and the first defined match as group 1.

Mario
  • 35,726
  • 5
  • 62
  • 78
  • Unfortunately this solution doesn't work in C# regex, string - "high h3", mathces not found at all, but I expected that if the string not found match should return whole string as a match result. Why I needed is because this is the only part of my regex expression and there is another patterns for seacrh which should work even if the "pass" not found. – baio Feb 19 '12 at 11:34
  • Ah? "pass" should be optional? Did you think about defining alternative sequences using `|`? E.g. something like `^.*?(\b(?:passed|failed)\b).*?$` will match both alternatives. Why do you even have to match the whole line, considering it's what you might pass? You can make any sequence optional by adding `?`, but this might have an unexpected result when using wild card matches that could include your "keywords". – Mario Feb 20 '12 at 11:00
5

A period only matches a single character, so you're

^.(\bpass\b)?.$

is matching:

  • Start of input
  • A single character
  • Optionally
    • Word boundary
    • "pass"
    • Word boundary
  • Single char
  • End of input

which I would not expect to match "high pass h3" at all.

The regular expression:

pass

(no metacharacters) will match any string containing "pass" (but then so would a "find string in string" function, and this would probably be quicker without the complexities of a regex).

Richard
  • 106,783
  • 21
  • 203
  • 265