Regex matching between lines

Question

I have the following text block:

## 8.6.0
- **Upload date:** November 19, 2020
- **Release date:** TBC
- **Internal version:** 1171

### Feature
- dsfdsfds
- sdfdsf
- dsfdsf

### Bug fixes
- sdfsaf
- sdfsad
- sdfsdfdsf

### Internal
- sadfsdfsda
- fsdfgsadfasd
- sdfsda

## 8.5.1
- **Upload date:** November 09, 2020
- **Release date:** November 12, 2020
- **Internal version:** 1170

I would like to extract just the first entry e.g. all the text from the start of the first character in ## 8.6.0 up to just before the first character of ## 8.5.1.

I have tried the following expression:

[#].*[0-9])(.*?)([#].*[0-9])

But it doesn't return the right result. How would I write this expression?

You didn't really explain all the conditions but try something like `\A\#\#\s*\d[\s\S]+?(?=\#\#\s*\d|\Z)`. Here's a [demo](https://regex101.com/r/NNGLNL/1). — 41686d6564 stands w. Palestine, Nov 25 '20 at 10:12

Wiktor Stribiżew · Accepted Answer · 2020-11-25T10:39:51.963

2

Use

^##(?!#).*(?:\n(?!##(?!#)).*)*

See the regex demo.

Details

^ - start of a string (if you use it in an environment with the multiline flag enabled by default, try prepending it with (?-m) or use ^(?<![\s\S]))
##(?!#) - a ## substring not followed with another #
.* - the rest of the line
(?:\n(?!##).*)* - zero or more lines not starting with ## not followed with another #.

edited Nov 25 '20 at 10:39

answered Nov 25 '20 at 10:33

Wiktor Stribiżew

607,720
39
448
563

1

This has the advantage that the final newline characters are not included, assuming that is actually desirable. – Booboo Nov 25 '20 at 14:01

Peritract · Answer 2 · 2020-11-25T11:30:07.023

2

If I've understood the problem correctly, then

^## (\d+\.?){3}.+?(?=## \d)

should work. Here's a demo.

The pattern does the following:

Looks for two hashes followed by a version number, right at the start of the string - ^## (\d+\.?){3}.
Looks ahead for another two hashes followed by a number (start of the next set of notes) - (?=## \d)
Grabs all the characters in between the two, aiming for as few characters as possible - .+?

To make this work, you need the dotall flag enabled, so . can match newline characters.

edited Nov 25 '20 at 11:30

answered Nov 25 '20 at 11:03

Peritract

761
5
13

1

Note your regex [will match](https://regexr.com/5h0oc) across more sections if there are any. [My solution](https://regex101.com/r/IleRL8/3) will still correctly match the first section. – Wiktor Stribiżew Nov 25 '20 at 11:24
Good catch, thank you - I've updated mine to be less greedy. – Peritract Nov 25 '20 at 11:30
1

There are still two things: 1) `## \d` matches anywhere on a line, not only at the start, which again might be unwelcome, 2) [your regex](https://regex101.com/r/2xaPVg/1) (499) takes a double amount more steps than [mine](https://regex101.com/r/IleRL8/2) (123) due to the [unroll-the-loop principle](https://stackoverflow.com/a/38018490/3832970) used in my solution. – Wiktor Stribiżew Nov 25 '20 at 11:36
Thank you again - you're absolutely right. I can fix the line-start thing easily, but that makes mine even less efficient. I'll read through the unrolling principle link. Thanks once more. – Peritract Nov 25 '20 at 11:42

Regex matching between lines

2 Answers2