2

The goal is to have regex match all newline character which are not preceded by a 2-decimal number. Here's some example text:

This line ends with text
this line ends with a number: 55
this line ends with a 2-decimal number: 5.00
here's 22.22, not at the end of the line

Regex should match the end of lines 1, 2, and 4 (assuming a newline after the 4th line). I thought negative lookahead was the answer so I tried

(?!\d*\.\d\d)\n

without success as seen in this regex101 snippet: https://regex101.com/r/qbrKlt/4

Edit: I later discovered the reason this didn't work is because Python's Regex doesn't support variable length negative lookahead - it only supports fixed-length negative lookahead.

Unfortunately fixed-length look-ahead still didnt work:

(?!\.\d\d)\n

Instead I did a workaround by running regex twice & subtracting the result:

  1. find all indices of newline characters: \n
  2. find all indices of newline characters preceded by 2-decimal numbers: \d*\.\d\d\n
  3. remove indices found in step 2 from those found in step 1 for the answer

But I'm sure there's a way to do this in 1 go and I'd be grateful to anyone out there that can help in discovering the solution :)

anubhava
  • 761,203
  • 64
  • 569
  • 643
HyperActive
  • 1,129
  • 12
  • 12
  • 1
    Why should it match the second line? – Eraklon Feb 19 '20 at 16:28
  • 1
    *Python's Regex doesn't support variable length negative lookahead* - wrong, `re` supports variable-width lookaheads. It does not support unknown width **lookbehind** patterns. – Wiktor Stribiżew Feb 19 '20 at 20:13
  • 1
    Incorrect dupe: This question is not just about use of lookbehind it is about handing a very specific case of unknown width lookbehind pattern in python. – anubhava Jan 14 '22 at 07:13

2 Answers2

2

You need to use a negative lookbehind instead of a negative lookahead:

(?<!\.\d\d)\n

Updated RegEx Demo

This will match \n if that is not immediately preceded by dot and 2 digits.

anubhava
  • 761,203
  • 64
  • 569
  • 643
1

Why get esoteric with regexes, when you can just capture the final word using string.split()[-1] and test that for the form you need? Python isn't Perl (fortunately).

dstromberg
  • 6,954
  • 1
  • 26
  • 27