-2

I'm trying to extract multiple groups of sentences based on the following logic:

  • the first sentence must contain a certain word with alternatives
  • keep collecting the following sentences until you you reach a sentence with a specific word with alternatives

Input (made up example):

There is a finding on T2 of spine. The finding is most likely fracture. Additionally, patient seems tired. In L2, patient has there is a circumferential disc bulge with Central disc herniation. In L5, patient seems to have another fracture. In the cervical spine, patient has any degeneration. Patient is may also have fever. L3, endplate edema is also found. In L5, patient may have bruise.

Regex:

[^.]*(cervi(c|x)|C[1-7]|T[1-6]).*\.(?=[^.]*L[1-5][^.]*\.)

Expected Output:

  1. There is a finding on T2 of spine. The finding is most likely fracture. Additionally, patient seems tired.

  2. In the cervical spine, patient has any degeneration. Patient is may also have fever.

Actual Output:

There is a finding on T2 of spine. The finding is most likely fracture. Additionally, patient seems tired. In L2, patient has there is a circumferential disc bulge with Central disc herniation. In L5, patient seems to have another fracture. In the cervical spine, patient has any degeneration. Patient is may also have fever. L3, endplate edema is also found.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
chethanjjj
  • 53
  • 5
  • Hi, perhaps use the non-greedy quantifier `?` – IronMan Sep 21 '20 at 17:16
  • I was trying that. I was putting it at the end of the lookahead group, but no change – chethanjjj Sep 21 '20 at 17:18
  • Make the `.*` lazy and have it match all so it span's lines `[^.]*(cervi(?:c|x)|C[1-7]|T[1-6]).*?\.(?=[^.]*L[1-5][^.]*\.)` https://regex101.com/r/x3PmQP/1 or `[^.]*(cervi(?:c(?:al)?|x)|C[1-7]|T[1-6]).*?\.(?=[^.]*L[1-5][^.]*\.)` https://regex101.com/r/hif9VA/1 –  Sep 21 '20 at 17:33
  • Interesting, that does work. so without `.*?`, regex will just find the longest combination of.... any characters + . + sentence with L[1-5] – chethanjjj Sep 21 '20 at 17:48

1 Answers1

0

You were almost there :)

Just replace the .* in the middle (which is greedy) with .*? (which is lazy), and you'll get the desired output.

Greedy matches as much as it can. Lazy matches as little as it can.

Demo: https://regex101.com/r/TA1WbO/1

Vincent
  • 3,945
  • 3
  • 13
  • 25