regex101 - match all occurrences of a string only in the first line where it is found

Question

In regex101 site I have the text:

text3 text3 text3 text3 text3 text3

text1 text2 text1 text2 text1 text2

text1 text2 text1 text2 text1 text2

text1 text2 text1 text2 text1 text2

I want to match all text2 but only in the first line where it is found:

text3 text3 text3 text3 text3 text3

text1 **text2** text1 **text2** text1 **text2**

text1 text2 text1 text2 text1 text2

text1 text2 text1 text2 text1 text2

How do I get this?

Loop over the lines. Test if the line matches `text2`. If it does, return all the matches. Then break out of the loop. — Barmar, May 17 '23 at 17:43
Sorry, I have fixed the question, I have need to do this in regex101 site. — Mario Palumbo, May 17 '23 at 17:55
I don't think you can do this in a single regexp. You need a negative lookbehind that matches a previous row containing `text2`. But this requires a variable-length lookbehind, and most engines don't allow this. — Barmar, May 17 '23 at 17:56
It doesn't matter why I want to do it there for the purpose of solving my question. — Mario Palumbo, May 17 '23 at 18:00
This site is for discussing programming. Programming is done using programming languages. And this is easy to do in most programming languages, even if you can't do it in a single regexp on the testing site. — Barmar, May 17 '23 at 18:01

oriberu · Answer 1 · 2023-05-20T10:03:47.570

3

You can do this with regex flavours like PCRE, which support an end-of-match token (\G) and can be set to single line matching (/s). The idea is to match the first target by anchoring to the beginning of the line while consuming as few characters as possible (^.*?) and then to allow further matches only at the end of previous matches, while excluding line breaks in procuring them (\G[^\r\n]*?). See regex demo.

An expression to do that could look like this:

/(?:^.*?|\G[^\r\n]*?)\Ktext2/gs

\K is simply used to cut out the preceeding part of matches from the result to avoid using capturing groups for singling out text2.

To cover other aspects of line break/position matching, if you want to drop the single-line modifier (/s), in which case . ceases to match new-line characters, you can use a class that also matches line breaks, like [\s\S]*?, instead of .*? to get the initial match. See demo.

/(?:^[\s\S]*?|\G[^\r\n]*?)\Ktext2/g

If you want to use the multi-line modifier /m specifically, in which case the caret ^ now matches at the beginning of every line, you'll have to use the anchor for the beginning of string \A instead to match the initial target. See demo.

/(?:\A[\s\S]*?|\G[^\r\n]*?)\Ktext2/gm

edited May 20 '23 at 10:03

answered May 17 '23 at 20:13

oriberu

1,186
9
6

With PCRE, in singleline mode, it's possible to avoid `[^\r\n]` with `\N`: https://regex101.com/r/s742A5/1 – Casimir et Hippolyte May 21 '23 at 17:05
@CasimiretHippolyte I'm actually not sure about the behaviour of `\N`. Presumably, when the line ending is defined as `CR`, `\N` would not match it. If, however, the line ending was defined as `CRLF` - as in Windows - wouldn't `\N` match `CR` and only leave out `LF`? See [FULL STOP (PERIOD, DOT) AND \N](https://www.pcre.org/original/doc/html/pcrepattern.html#SEC7). – oriberu May 21 '23 at 17:48
1

If you are unsure about dots, newline sequences, etc. , play with that: https://3v4l.org/vWQtc#v8.2.6 – Casimir et Hippolyte May 21 '23 at 20:26
@CasimiretHippolyte Thanks for the demo; very nifty, I'll probably steal that. With that in mind, I would not replace `[^\r\n]` with `\N`, since `\N` matches `\r` if the line type is `LF` (which might be the default) and no line setting modifier is used to explicitly accept `CR` or `CRLF` as newline. – oriberu May 22 '23 at 09:11

score 2 · Answer 2 · answered May 17 '23 at 19:53

2

You can do it using trick with capturing groups inside of lookahead (and disabled global flag of course).

Your regex would this:

text2(?=(?:.*?(text2))*)

Demo here

Notice that if you have more than two separate matching elements, you'll need to select .Net ending, as only it will allow multiple captures for the same group.

answered May 17 '23 at 19:53

markalex

8,623
2
7
32

What is the difference between () and (?=)? – Mario Palumbo May 18 '23 at 14:35
`()` - capturing group, matches inner pattern and keeps matched string ing group with corresponding number, more [here](https://stackoverflow.com/questions/21880127). `(?=)` - positive lookahead, checks that current position followed by something, that is matching content of said lookahead. More [here](https://stackoverflow.com/a/1570916). – markalex May 18 '23 at 14:45
@MarioPalumbo, `(?=)` checks, but doesn't captures what is has checked. But when `()` used inside, what is matched by it is captured. – markalex May 18 '23 at 19:29
I apologize for not specifying. I need it to be PCRE2 compatible and all "text2" on the line to be matches and not group captured. – Mario Palumbo May 18 '23 at 21:10

regex101 - match all occurrences of a string only in the first line where it is found

2 Answers2