1

In the following regex:

EXCLUDE this entire line
include this line
and this as single match
and EXCLUDE this line

I want to return a single match consisting for two lines:

include this line
and this as single match

I want to use EXCLUDE as string identifying that the entire line should not be included.

edit: if I can get just the first match up to the line with "EXCLUDE" (or end of document whichever happens first), that would work too

Chebz
  • 1,375
  • 3
  • 14
  • 27
  • Please add a tag that identifies the language you are using as different languages have regex engines with different features. – Cary Swoveland Jun 20 '22 at 02:11
  • Suppose the line `"First include this line"` were added before the first line in your example and `"Lastly include this line"` were added after the last line in the example. What would be you desired result? – Cary Swoveland Jun 20 '22 at 02:15
  • yeah, any line(s) that don't contain EXCLUDE should be included, if they are consecutive, it should be a single match. I've been trying different things for 2 days, not sure if it's possible at all. – Chebz Jun 20 '22 at 02:25
  • if I can get just the first match up to the line with "EXCLUDE" (or end of document), that would work too – Chebz Jun 20 '22 at 02:28
  • 1
    This question differs from the earlier one cited as the basis for closing the question. I initially posted an answer with a regex that matched lines that didn't include the specified word and was quickly admonished, reminded that the OP stated, "I want to return a single match consisting for two lines:.."`. I've voted to reopen. Interesting, my vote reopened the question. – Cary Swoveland Jun 20 '22 at 18:01

3 Answers3

4

You can split the string on matches of the regular expression

^.*\bEXCLUDE\b.*\R

with global and multiline flags set.

In Ruby, for example, if the variable str held the string

Firstly include this line
EXCLUDE this entire line
include this line
and this as single match
and EXCLUDE this line
Lastly include this line

then the method String#split could be used to produce an array containing three strings.

str.split(/^.*\bEXCLUDE\b.*\R/)
  #=> ["Firstly include this line",
  #    "include this line\nand this as single match",
  #    "Lastly include this line"]

Many languages have a method or function that is comparable to Ruby's split.

Demo

The regular expression can be broken down as follows.

^        # match the beginning of a line
.*       # match zero or more characters other than line
         # terminators, as many as possible
\b       # match word boundary
EXCLUDE  # match literal
\b       # match word boundary
.*       # match zero or more characters other than line
         # terminators, as many as possible
\R       # match line terminator
 
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • Oh, we came out to the same answer! Nice explanation! I also think that splitting is easier than doing it with the opposite matching pattern. – Patrick Janser Jun 20 '22 at 06:56
4

With pcre you can use \K to fotget what is matched so far, and first match the line containing exclude:

^.*\bEXCLUDE\b.*\K(?:\R(?!.*\bEXCLUDE\b).*)+

Regex demo

If you want to match all lines that do not contain exclude, with consecutive lines:

(?:(?:^|\R)(?!.*\bEXCLUDE\b).*)+

Regex demo

Or using a skip fail approach:

^.*\bEXCLUDE\b.*\R(*SKIP)(*F)|.+(?:\R(?!.*\bEXCLUDE\b).*)*

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Very nice solution! Great for the `\K` and `\R`. It just needs a correction to work if there's a line without the `EXCLUDE` word at the begining: https://regex101.com/r/tszZLF/1 – Patrick Janser Jun 20 '22 at 07:14
  • @PatrickJanser Thank for your comment, in that case I have added 2 pattern that will match those lines. – The fourth bird Jun 20 '22 at 07:59
  • I can't tell from the wording of the question if the string necessarily contains a line that contains `"EXCLUDE"`. If there were not such a line the entire string presumably would be returned. You have a typo at `answer[29]`. – Cary Swoveland Jun 20 '22 at 18:09
3

You could also match the lines with the EXCLUDE and use it to split your text into blocks of what you are looking for:

<?php

$input = 'First include this line
EXCLUDE this entire line
include this line
and this as single match
and EXCLUDE this line
Lastly include this line';

// ^ matches the beginning of a line.
// .* matches anything (except new lines) zero or multiple times.
// \b matches a word boundary (to avoid matching NOEXCLUDE).
// $ matches the end of a line.
$pattern = '/^.*\bEXCLUDE\b.*$/m';

// Split the text with all lines containing the EXCLUDE word.
$desired_blocks = preg_split($pattern, $input);

// Get rid of the new lines around the matched blocks.
array_walk(
    $desired_blocks,
    function (&$block) {
        // \R matches any Unicode newline sequence.
        // ^ matches the beginning of the string.
        // $ matches the end of the string.
        // | = or
        $block = preg_replace('/^\R+|\R+$/', '', $block);
    }
);

var_export($desired_blocks);

Demo here: https://onlinephp.io/c/4216a

Output:

array (
  0 => 'First include this line',
  1 => 'include this line
and this as single match',
  2 => 'Lastly include this line',
)
Patrick Janser
  • 3,318
  • 1
  • 16
  • 18