0

This regex question is kind of an extension of this question

Input

String input="first number <start number>123.45<end number> 
               and second number 678.90."

Desired output

String output="first number <start number>123.45<end number> 
               and second number <start number>678.90<end number>."

What I tried

I have a negative lookback for <number start> and a negative lookahead for <number end>:

String regex="(?<!(<number start>))\\d+(\\.\\d+)?(?!(<number end>))
//             ^^^^^^^^^^^^^^^^^^^^              ^^^^^^^^^^^^^^^^^
//            negative lookback                    negative lookahead
//                                 ^^^^^^^^^^^^^
//                                  float match

But the problem is that for a String <number start>12.34<number end> it will match on 2.3.

When I include quantifiers in the lookback I get an error

String regex="(?<!(<number start>\\d+))\\d+(\\.\\d+)?(?!(\\d+<number end>))
//             ^^^^^^^^^^^^^^^^^^^^^^^               ^^^^^^^^^^^^^^^^^
//            negative lookback                      negative lookahead
//                                     ^^^^^^^^^^^^^
//                                     float match

Thanks for the help!

tenticon
  • 2,639
  • 4
  • 32
  • 76
  • 1
    Just [`"(?<!)\\b\\d+(?:\\.\\d+)?\\b(?!)"`](https://regex101.com/r/6xTKBh/1) will work. – Wiktor Stribiżew Dec 15 '17 at 11:47
  • 1
    First, this is not a duplicate question. Second, the proposed regex doesn't help. – l33t Dec 15 '17 at 12:03
  • @l33t Explain why my suggestion does not help. I provided a regex demo that works correctly: the regex does not match a number inside `` tags. – Wiktor Stribiżew Dec 15 '17 at 12:09
  • He obviously swapped the tokens. – l33t Dec 15 '17 at 12:12
  • @tenticon, try regex-replace (only with group 2) with this: `([-+]?\d*\.?\d+)|([-+]?\d*\.?\d+)` – l33t Dec 15 '17 at 12:12
  • @l33t, the regex proposed by @WiktorStribiżew does work. The problem is that the asker of this question mixed the order of the words: `start number` vs. `number start`. However Wiktor's regex makes the assumption that the number is enclosed by `\b`. – Marcono1234 Oct 28 '18 at 13:27

2 Answers2

0

It's a limitation of the incredibly slow lookbehind feature. For lookbehind, you cannot have an expression matching text of arbitrary length. Which is what the error message tells us.

You could try something like this:

(<start number>[-+]?\d*\.?\d+<end number>)|([-+]?\d*\.?\d+)
  • $1: Matches including the tags.
  • $2: Matches excluding the tags.

Then replace text accordingly.

l33t
  • 18,692
  • 16
  • 103
  • 180
0

Instead of including the \d in the existing lookbehind, you can make a new one for it:

(?<!<number start>|\d)\d+(?:\.\d+)?(?!\d|<number end>)

The pipe character (|) in the lookbehind / lookahead is a boolean "or". This solution is similar to what you tried, but does not cause an exception because the lookbehind values have a fixed length.

To explain it a little bit more in detail: Since the regex is supposed to match a decimal number, there must not be leading or trailing digits because they should be part of the match. Therefore they are forbidden (using the negative lookbehind / lookahead) as well.

Live demo: https://regex101.com/r/MdS7rF/1

Marcono1234
  • 5,856
  • 1
  • 25
  • 43