0

I am trying to match everything up to the last "Saving*" line before "ModelFinish". I can almost do this with negative look-around (described in Regular expression to match a line that doesn't contain a word), but can't get it working with newlines in the string I'm trying to match. I'm using notepad++ and there's a checkbox for ". matches newline"

Input:

Begin: model 17
Epoch 15800, loss 4051304.017, val_PMAE 6.9
Saving at epoch 15828 with loss: 3974847.290
Saving at epoch 15889 with loss: 3968749.471
ModelFinish: Stop training
Begin: model 18
Saving at epoch 15889 with loss: 3968749.223
Saving at epoch 15889 with loss: 3968749.200
Epoch 15800, loss 4051304.017
ModelFinish: Stop training
Begin: model 19

Desired first match:

Begin: model 17
Epoch 15800, loss 4051304.017, val_PMAE 6.9
Saving at epoch 15828 with loss: 3974847.290

Desired second match:

Begin: model 18
Saving at epoch 15889 with loss: 3968749.223

My attempt (with ". matches newline" checked):

^Begin:(?:(?!Saving.*Model).)*$

My plan is to use notepad++ to find-and-replace the text I don't want with "", so that I'm just left with the final "loss" from each model. (Ie: model 17 loss: 3968749.471, model 18 loss: 3968749.200, etc)

maurera
  • 1,519
  • 1
  • 15
  • 28
  • To get until the last line before ModelFinish you could use https://regex101.com/r/c3RnbS/1 but your desired result only matches the first line after Begin or Epoch To get those matches you could use https://regex101.com/r/NJckKI/1 – The fourth bird Nov 20 '19 at 17:59
  • How about: `^Begin:(?:(?!ModelFinish).)*(?=^Saving)`? – Toto Nov 20 '19 at 18:06
  • You desired matches are not in line with the requirements, they show you want to match until the first line starting with `Saving` before `ModelFinish` – Wiktor Stribiżew Nov 20 '19 at 18:11
  • @WiktorStribiżew - the desired matches as written are correct (it just so happens that in the two examples 'everything up to the last "Saving*" line before "ModelFinish"' is equivalent to 'the first line starting with Saving before ModelFinish' – maurera Nov 20 '19 at 20:54

1 Answers1

0

You don't have to enable the dot matching the newline if you match the newlines using \R to match a unicode newline sequence.

To match before the last occurrence of Saving before ModelFinish you could match the lines that don't start with ModelFinish and use positive lookahead (?= that asserts what follows is a newline and Saving.

^Begin:.*(?:\R(?!ModelFinish).*)*(?=\RSaving)
  • ^ Start of string
  • Begin:.* Match Begin: and any char except a newline 0+ times
  • (?: Non capturing group
    • \R(?!ModelFinish) Match a newline and assert that the line does not start with ModelFinish
    • .* Match any char except a newline 0+ times
  • )* Close non capturing group and repeat 0+ times
  • (?=\RSaving) Positive lookahead, assert what is on the right is a newline followed by Saving

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70