3

I need to match string like:

RHS numberXnumberXnumber

contained in strings like these: (all numbers can have the decimal part or not)

foo RHS 100x100x10 foo  
foo RHS 100.0x100x10 foo  
foo RHS 100.0x100.0x10.0 foo
foo RHS 100x100.0x100x10 foo  
foo RHS 10.0x100.0x10.0x10.0 foo  

I've written this:

RHS \d+.?\d?x\d+.?\d?x\d+.?\d

but this regex match also the first groups of number of the following string: foo RHS 100x100x100x10 foo

how can I can I avoid that? basically I don't want any match if there are four groups of number

TuoCuggino
  • 365
  • 1
  • 4
  • 13
  • 4
    [Add `(?!x\d)` at the end](https://regex101.com/r/7y7F6i/1). – Wiktor Stribiżew Dec 16 '17 at 23:02
  • 2
    You need to add a `negativ lookahead` - which @WiktorStribiżew provided. It will look at string directly _after_ your match and the whole match is discarded if the negative lookahead finds the pattern you defined. Have a look at this : https://stackoverflow.com/questions/4736/learning-regular-expressions/2759417#2759417 – Patrick Artner Dec 16 '17 at 23:10
  • 1
    and this : [reference-what-does-this-regex-mean](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075) – Patrick Artner Dec 16 '17 at 23:15
  • 1
    It looks like a couple of word boundaries can solve this, try [`r'\bRHS \d+(?:\.\d+)?(?:x\d+(?:\.\d+)?){2}\b'`](https://regex101.com/r/7y7F6i/4), but it looks a bit more complex upon a closer look. – Wiktor Stribiżew Dec 16 '17 at 23:42
  • 1
    If you just want to use *your* regex and avoid matching the substring in the last sample input you provided, try [`RHS \b(?!\d+(?:x\d+){3})\d+\.?\d?x\d+\.?\d?x\d+\.?\d`](https://regex101.com/r/7y7F6i/5). With the negative lookahead, all RHS with 4 integer parts will get failed. – Wiktor Stribiżew Dec 16 '17 at 23:49

2 Answers2

2

The examples provided are incorrect. Most match line 4, and all match line 5. They're right that you need word boundaries and a negative lookahead, but they are missing a crucial piece.

Regex matches word boundaries before and after dots. Meaning if the third number has a decimal, the search will not run to the end of the line, will not see the final x, but will match the string.

As a solution you need to use this negative lookahead that checks after the decimal for an x: (?!\.?\d?x)

As well as wrapping the search in word boundaries: \b...\b

I tested the string below and it works.

\bRHS \d+\.?\d?x\d+\.?\d?x\d+\.?\d?(?!\.?\d?x)\b

Example: https://regex101.com/r/0nzHkN/2/

Josh Bradley
  • 1,854
  • 1
  • 15
  • 31
0

Use this regular expression instead:

RHS \d+.?\d?x\d+.?\d?x\d+.?\d?(?!x)

Or a compact version of it:

RHS (\d+.?\d?){3}(?!x)
Josh Withee
  • 9,922
  • 3
  • 44
  • 62