2

Imagine I have a text file full of cat poems and need to find all poems that end with the word dog. The poems all start with the word cat. How do I match only poems that start with cat and end with dog?

Cat poem: 
My feline is very furry
I like furry felines
This is why I do not have a dog

Cat poem:
Littly furry paws
this is what i like
I don't care if it's a feline or a canine

Cat poem:
The little felines
playing in the field
sitting on the side watching is a dog

In my example, I want the first and last poem to be matched while the middle should not be matched. If all poems ended with dog, (?=cat).*?(?<=dog) would be an easy solution (thanks to this answer). However, this first matches the first poem and then the second and third poem together (as there is no dog in the second poem). Any extension to that regex I tried yielded just the same result, e.g. (?=cat).*?(?!cat).*?(?<=dog).

I am using Notepad++ (v6.5.2) so any answer should include a solution for that. If another environment allows a more elegant solution, feel free to add that, too.

Community
  • 1
  • 1
iraserd
  • 669
  • 1
  • 8
  • 26

1 Answers1

2

You can use a tempered greedy token regex that will match substrings from Cat till dog that contain no Cat inside:

^Cat\b(?:(?!^Cat\b).)*\bdog\b(?=\R+Cat\b|\z)

. matches newline option must be checked. See the regex demo here.

Pattern breakdown:

  • ^ - start of a line
  • Cat\b - whole word Cat
  • (?:(?!^Cat\b).)* - the tempered greedy token matching any text that is not a whole word Cat at the start of a line
  • \bdog\b - a whole word dog...
  • (?=\R+Cat\b|\z) - that is either followed with 1+ newline sequence(s) (with \R+) and then a whole word Cat, or at the end of the file (\z or \Z are the whole string end anchors, just \Z allows a newline to be right after it).

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Well I don't know how anyone would be able to improve on that answer inlcuding splitting up the regex, a reference link and a demo, so I'll accept this one right away! :) Thanks! – iraserd Mar 15 '16 at 10:04
  • Maybe you can also help me with a follow-up question: I would like to copy those two poems out of the file (or erase the middle poem, whatever is easier). I tried via `bookmark line` but that only bookmarks the first line. Any ideas? – iraserd Mar 15 '16 at 10:28
  • I just found out that my answer is not precise, you need [`^Cat\b(?:(?!^Cat\b).)*\bdog\b(?=\R+Cat\b|\z)`](https://regex101.com/r/qU0vS5/2) if `dog` can be at the end of a poem line inside the poem. – Wiktor Stribiżew Mar 15 '16 at 10:34
  • You can use the following regex to remove poems like the 2nd one: `^Cat\b(?:(?!^Cat\b).)*\b(?!dog\b)\w+\b(?=\R+Cat\b|\z)`. – Wiktor Stribiżew Mar 15 '16 at 10:39