2

I have the following string

huile contains rgbgrbrb9gr && huile contains fcecec

I use this regex in order to capture a block of condition:

(.+) (contains) (.+)

It works with one block "huile contains rgbgrrb9gr" but if i add another condition with && or || operator, the two operators are captured. What i'm expecting to capture if the two blocks excluding && and || operator.

Can someone have any idea how to achieve this?

Yanis600
  • 71
  • 9
  • What is the actual output you want here? – Tim Biegeleisen Dec 24 '21 at 10:59
  • First output: huile contains rgbgrbrb9gr Second output: huile contains fcecec – Yanis600 Dec 24 '21 at 11:00
  • **Side note**: regex is only suitable for matching lexical tokens. If you have a context-free grammar that you need to parse, you need to look for tools such as yacc or bison. – DannyNiu Dec 24 '21 at 11:10
  • Additionally you might want to indicate which dialect of regex you're using. JavaScript RegExp? Perl-compatible? POSIX? – DannyNiu Dec 24 '21 at 11:52
  • i'm working on Qt so i'm using qregularexpression – Yanis600 Dec 24 '21 at 11:55
  • @Yanis600 Do you want to prevent matching `&&` and `||` or do you not want to match a `&` char and a `|` char? Can you add to your question an example what should match and what should not match? Do you want 3 capture groups in the result, or only matches? – The fourth bird Dec 24 '21 at 12:33
  • I have the following filter string as an example: 'oil & blah'blah' contains 'oil&blah'blah' && 'oil & blah'blah' contains 'oil blah's' What i want to catch is the following patterns 'string' contains 'substring' the ' && ' or ' || ' must be excluded, and the pattern mentionned above has three matches The & and | must only be captured in string i want to search and the string where to search – Yanis600 Dec 24 '21 at 12:42
  • 1
    @Yanis600 Perhaps like this? https://regex101.com/r/taFZUP/1 – The fourth bird Dec 24 '21 at 13:17
  • Wonderful, thank you so much :) I assume that regex are really hard to built. – Yanis600 Dec 24 '21 at 13:32

3 Answers3

1

Regex normally matches the longest input it finds.

You need to exclude & and | from your input, like this:

([^&|]+) (contains) ([^&|]+)

If you instead desire to exclude double-character && and ||, I suggest spliting your string based on those delimiters first, then matching using regex, as complex parsing is really beyond the realm of regex (they're grammars actually).

But, a regex solution is nontheless possible

The rough idea is that, you want to match a string with

  1. an optional prefix consisting of no & or |
  2. a single & or | followed by a non-empting string
  3. repeating 2 for non-zero number of times.

the subpattern would be something like this:

(([^&|]+)?([&|][^&|]+)+)

additionally, you'll want something like the egrep's x flag, to match the entire string, otherwise it'll be possible that an empty string turns up.

The full regex would look something like this (capture groups're re-numbered)

(([^&|]+)?([&|][^&|]+)+) (contains) (([^&|]+)?([&|][^&|]+)+)
DannyNiu
  • 1,313
  • 8
  • 27
  • it works but if i need to search for & or | operator, i got a wrong capture. What i need to capture is for example: 'oil & other' contains 'oil &' – Yanis600 Dec 24 '21 at 11:13
1

After reading the post comments, the desired result was more clear.

This one could work too:

(?<=^|(?:&&|\|\|) )(.+?) (contains) (.+?)(?= (?:&&|\|\|)|$)

https://regex101.com/r/YDFpN9/2

Jean Will
  • 543
  • 3
  • 11
1

If you want 3 capture groups, you could match what you don't want first, and then capture in groups what you want to keep making use of a tempered greedy token approach to not cross matching && or || or the word contains.

\|{2,}|&{2,}|((?:(?!&&|\|\||\bcontains\b).)*) (contains) ((?:(?!&&|\|\||\bcontains\b).)*)

The pattern matches:

  • \|{2,}|&{2,} Match either 2 or more pipe chars or ampersands (what you don't want to keep)
  • | Or
  • ( Capture group 1
    • (?:(?!&&|\|\||\bcontains\b).)* Match any char except a newline if what is directly to the right is not && || or contains
  • ) Close group 1
  • (contains) Match the word contains in group 2 between spaces
  • ( Capture group 3
    • (?:(?!&&|\|\||\bcontains\b).)* Same approach as above
  • ) Close group 3

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70