1

Sample text:

heading1
foo=bar
baz=qux

heading2
spam=ham

heading3
this=that

I want the text from heading1 to heading2 (excluding).

So I tried a non-greedy match from "heading1" to "\n\n", but it is still greedy and overmatches:

var result = Regex.Match(text, "heading1\n(.*)\n{2}?", RegexOptions.Singleline)
  .Groups[1]
  .Value;

(I already have a solution by searching between "heading1" and "heading2", but I prefer to use the above approach as a learning exercise for non-greedy matching.)

UPDATE:
My question is simpler, has source text, is for C# specifically. It's easier to understand the problem. The linked question is technically relevant, but harder to apply to this case.

lonix
  • 14,255
  • 23
  • 85
  • 176
  • 1
    Maybe even [`^heading1(?:\n.+)*`](https://regex101.com/r/2XiVlQ/1) using `RegexOptions.Multiline` would do for the task. – bobble bubble Aug 19 '22 at 12:47
  • @bobblebubble In fact that is very helpful, if you add as answer I'll upvote. Seems there are, as usual, multiple ways to do it. – lonix Aug 19 '22 at 12:55
  • 2
    Yes there are many ways certainly. This is almost the same like anubhava's answer which inspired my comment, so I will not put another answer but glad it helps. – bobble bubble Aug 19 '22 at 13:00
  • 1
    @lonix: If you have reason to believe that dupe does not have any answer solving your problem then you may vote to reopen your question. – anubhava Aug 20 '22 at 07:51
  • 3
    @anubhava I thought about it, my regex questions are routinely closed. There are some mods who are aggressive closers/deleters in the regex tag. Right now they consider every question to be a dupe of 20 (say) canonical regex questions and answers, and that's it. I don't get it, that isn't how SO works, and it isn't how other tags work, it's just in regex that I see this behaviour. – lonix Aug 20 '22 at 09:06
  • 1
    @anubhava BTW, thank you for answering so many regex questions and treating each one on it's own merit - the way SO was supposed to be. I've learned a lot from your posts. – lonix Aug 20 '22 at 09:07
  • 3
    @Ionix The regex tag has indeed lately become very annoying. Exactly as you mention. Those poopyheads ([one especially](https://stackoverflow.com/users/3832970/wiktor-stribi%c5%bcew)) who became our "regex sheriff" loves to answer dupes himself e.g. [this question](https://stackoverflow.com/questions/29771901/why-is-this-regex-allowing-a-caret/29771926) is clearly a dupe of [that one](https://stackoverflow.com/questions/4923380/difference-between-regex-a-z-and-a-za-z). It's his cheap tactics to annoy other users who are simply here for the sake of getting tasks solved and have some fun. – bobble bubble Aug 20 '22 at 17:16
  • 3
    @bobblebubble I didn't want to mention names but yes, him in particular. I was going to open a thread on meta to ask if it's "just me" and whether it's my questions that are bad. Glad to know it's not just me. The regex tag is truly aggravating and almost unusable. With so little rep there's nothing I can do about it, except hope my question is "good enough" for their (it's more than just him) standards. – lonix Aug 21 '22 at 01:49
  • 2
    Ah the deleters are here - trying to hide the discussions above. – lonix Aug 24 '22 at 01:55
  • 3
    @bobblebubble I am thinking of posting to meta.stackoverflow.com for guidance, before this question is ALSO deleted, more evidence lost. But I'm unsure what to post while avoiding drama. I see we [are not](https://meta.stackoverflow.com/questions/413166/attempts-to-reopen-a-basic-regex-question) the only ones [frustrated](https://meta.stackoverflow.com/questions/412611/a-plea-against-regex-dogmatism) by the regex tag, it's a well known problem, and it's due to a small set of people. What can we do? I come to SO for help, and when it's a regex problem all I get is frustration. – lonix Aug 24 '22 at 04:03
  • 3
    @Ionix I have a feeling something is about to change in the regex tag to the positive. At the time I got myself [a meta-discussion](https://meta.stackoverflow.com/questions/419923/whats-wrong-with-my-answer) going since several days regarding a recently posted answer. There were overall positive reactions from the Stack Overflow community that made me feel better. However for now I'm taking a bit distance. Imho meta is the place to discuss such things, however nobody can predict the outcome. Certainly many users here in the regex-tag share similar perceptions and I know some went away already. – bobble bubble Aug 24 '22 at 04:52
  • 2
    @bobblebubble Thanks for that post, very interesting. [Another](https://meta.stackoverflow.com/questions/405460/what-should-we-do-when-one-person-tries-to-delete-every-duplicate) one that is very relevant. – lonix Aug 24 '22 at 07:10

2 Answers2

2

You may use this regex to match:

^heading1\n((?:.+\n)*)

Or if heading2 must appear after the match then:

^heading1\n((?:.+\n)*)(?=\n+heading2\n)

RegEx Demo

RegEx Details:

  • ^: Start of a line
  • heading1\n: Match heading1 followed by \n
  • (: Start capture group #1
    • (?:.+\n)*:
  • ): End capture group #1

Code:

string pattern = @"^heading1\n((?:.+\n)*)";
RegexOptions options = RegexOptions.Multiline;
        
foreach (Match m in Regex.Matches(input, pattern, options)) {
   Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
  • (?=\n+heading2\n): Positive lookahead to assert that we have 1+ line breaks followed by heading2 and line break ahead
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Thanks for the alternative solution with such detailed explanation. – lonix Aug 19 '22 at 09:21
  • You asked for `I want the text from heading1 to heading2`. I could simplify it by using `^heading1\n((?:.+\n)*)` but it doesn't meet your stated requirement as there could be `Heading3` on next line. – anubhava Aug 19 '22 at 09:23
  • Sorry for the miscommunication - as you see above I already stated I have a solution from the two headers (like you did)... I wanted to specifically use non-greedy match as a learning exercise. As you can see the non-greey approach is `heading1\n(.*?)\n{2}` – lonix Aug 19 '22 at 09:25
  • 1
    ok you can see my updated answer. `^heading1\n((?:.+\n)*)` will still be far more efficient (since it doesn't do backtracking) as compared to `heading1\n(.*?)\n{2}` with single line or DOTALL mode enabled. – anubhava Aug 19 '22 at 09:29
  • 1
    I changed between the two answers twice, but it seems unfair as even though your answer is better, without doubt (and thanks!), his answer was specifically for the question. I feel bad, I want to give you both the solution. :) :( – lonix Aug 19 '22 at 09:31
1

You're applying non-greediness at the wrong place, since \n{2} is a fixed quantifier and needs no greediness at all.

Instead, you should make .* non-greedy by using .*? so that it avoids matching double newlines itself:

heading1\n(.*?)\n{2}
blhsing
  • 91,368
  • 6
  • 71
  • 106