0

I want to get all numbers in the format xx.xx except the ones preceeded by the text "capacity of ". For instance, given the text A car has 34.5 gallons, milleage capacity of 60.7 and a cost of 2000.00., I want to get 34.5 and 2000.00, but not 34.5 because it is preceeded by "capacity of ".

I tried (?<!capacity of )\d+\.\d+, but it does not work, it keeps returning 0.7.

Any hints? Thank you.

Fabio Correa
  • 1,257
  • 1
  • 11
  • 17

2 Answers2

2

Just move the space out of the parentesis

(?<!capacity of) \d+\.\d+
YOGO
  • 531
  • 4
  • 5
  • OK, sorry. I copy pasted the regex and yet I somehow messed it up, I suppose since it matched when I tried it. I probably failed to press Ctrl+C properly and had the one from the OP copied. – VLAZ Apr 01 '20 at 20:06
1

#For PCRE flavours only!

If your language of choice uses a PCRE engines (so languages like Perl, PHP, or R), then you can use backracking control verbs to make the engine forget it matched something. The expression you need is

capacity of \d+\.\d+(*SKIP)(*FAIL)|\d+\.\d+

See on Regex101

This works the following way:

  1. We set up the pattern to first try to match what we don't want - the full pattern capacity of \d+\.\d+ which resolves to the text "capacity of 60.7".
  2. Then the control verbs (*SKIP)(*FAIL) force the engine to forget that match and continue from that position onwards.
  • (*SKIP) instructs it to never backtrack beyond this point
  • (*FAIL) makes the pattern always fail at that point
  1. Lastly, since we do want to match \d+\.\d+ that's set up as an alteration pattern via |.

The order of alterations is important - if you try to match \d+\.\d+, then the regex engine will never check the next alteration, since the first one was satisfied. Therefore, you want the pattern to discard to be first.

One simple way to read the regex is "don't pay attention to anything matched first, only match the last thing".

If you want to discard more items, then you need to set them up as alterations in the same fashion, for example, you don't want to match a digit followed by a, b, or c but anything else, so, you can do

\da(*SKIP)(*FAIL)|\db(*SKIP)(*FAIL)|\dc(*SKIP)(*FAIL)|\d

See on Regex101

VLAZ
  • 26,331
  • 9
  • 49
  • 67