Regex: Match strings between two keywords mixed with useless data across multiple lines

Question

I'm trying to use NP++ regex to parse data from a file with input:

badstring ---
useless data
keyword1 ---
usless data
string 1 ---
random number of useless lines of characters across newlines
string 2 ---
more useless stuff
keyword2 ---
useless data
dumb badstring keyword2 ---

output:

string 1, string 2

For example sake, string1, string 2 and badstrings all have the same format, that's why I exclusively want to find string1 and string2 ONLY between keyword1 and keyword2.

The closest I was able to get is:

keyword1\r\n((.|\r\n)+?)\r\n(.+) ---\r\n((.|\r\n)+?)\r\n(.+) ---\r\n((.|\r\n)+?)keyword2

the problem is that I dont know the number of strings I need to capture, so I need to recursively search from the largest number of possible strings, and because I am using ((.|\r\n)+?) to match anything it always matches beyond the keyword, so when I run keyword1 ---((.|\r\n)+?)(.+) ---((.|\r\n)+?)(.+) ---((.|\r\n)+?)(.+) ---((.|\r\n)+?)keyword2 --- to find 3 strings it selects beyond keyword2 because the next section also contains keyword2 instead of returning no matches. Similarly if I do it searching for too many strings it will loop around and select the entire file. Any ideas?

I feel like a rudimentary parser would be more appropriate for handling this. I do understand that this may not be possible if you are limited to using NPP. — Tim Biegeleisen, Nov 03 '19 at 11:12

bobble bubble · Accepted Answer · 2019-11-03T12:04:57.653

1

How about using (*SKIP)(*F) to skip anything from start to keyword1 and everything from keyword2 until end of string. The question doesn't sound like recursion would be needed.

(?s:\A.*?^keyword1|^keyword2.*)(*SKIP)(*F)|^.*?(?=\h---)

See this demo at regex101

(?s: opens a non capturing group with doatall flag for make the dot match newlines too
\A matches start of the string, ^ matches start of line
\h matches a horizontal space
.*? matches lazily any amount of any characters
(?= opens a lookahead for just checking condition and not consuming

edited Nov 03 '19 at 12:04

answered Nov 03 '19 at 11:42

bobble bubble

16,888
3
27
46

@Toto thanks :) well not sure yet if that's what OP after – bobble bubble Nov 03 '19 at 12:02
Thank you so much for the eagerness to help, but NP++ and Atom both don't think that is a valid expression. I tried playing around with it, but couldn't figure out the problem Here is the input I am trying to process https://pastebin.com/VKSw5D0Q I am specifically trying to capture every (network adapter) --+ between NETWORK --+ and PORTS --+ Because I dont know how many network adapters there will be, I need to search for say 4 first, then 3, then 2, then 1, but only the correct number of network adapter should match then replace. – fatguy1121 Nov 03 '19 at 16:28
@fatguy1121 Seems to work as desired, see [this regex101 demo](https://regex101.com/r/MM8MFW/4). Getting two matches. I tested this of course in NP++ where I got the same results like in demo. I'm using NP++ 7.7.1. Also be sure to check the regex checkbox in search/replace and uncheck the `.` *matches newline* checkbox and also uncheck the *extended* checkbox. – bobble bubble Nov 03 '19 at 21:09
YAY! This is so much better than my previous method, skips about 30 steps in one regex. Thank You! – fatguy1121 Nov 03 '19 at 23:12
So, could I take the matches to a capture group and replace them onto a new line like NETADAPTERS:\1, \2, \3 (and so on until all matches are replaced) I'm almost certain that regex doesn't offer a way to capture to groups recursively without numbering each group directly right? – fatguy1121 Nov 04 '19 at 02:00
@fatguy1121 You can replace eg with `$0` the full match, so if you want to prepend `NETADAPTERS` at each match you'd [replace with `NETADAPTERS: $0`](https://regex101.com/r/MM8MFW/5). Just open a new question for new problem as it's difficult to describe it in a short comment :) – bobble bubble Nov 04 '19 at 09:41

score 0 · Answer 2 · answered Nov 03 '19 at 11:44

0

Maybe I'm missing something, but can't you just use the straight-forward

keyword1[\s\S]*(string1)[\s\S]*(string2)[\s\S]*keyword2

This should do what you describe.

answered Nov 03 '19 at 11:44

Pavel Lint

3,252
1
18
18

Regex: Match strings between two keywords mixed with useless data across multiple lines

2 Answers2