2

I have the following text block:

## 8.6.0
- **Upload date:** November 19, 2020
- **Release date:** TBC
- **Internal version:** 1171

### Feature
- dsfdsfds
- sdfdsf
- dsfdsf

### Bug fixes
- sdfsaf
- sdfsad
- sdfsdfdsf

### Internal
- sadfsdfsda
- fsdfgsadfasd
- sdfsda

## 8.5.1
- **Upload date:** November 09, 2020
- **Release date:** November 12, 2020
- **Internal version:** 1170

I would like to extract just the first entry e.g. all the text from the start of the first character in ## 8.6.0 up to just before the first character of ## 8.5.1.

I have tried the following expression:

[#].*[0-9])(.*?)([#].*[0-9])

But it doesn't return the right result. How would I write this expression?

Kex
  • 8,023
  • 9
  • 56
  • 129

2 Answers2

2

Use

^##(?!#).*(?:\n(?!##(?!#)).*)*

See the regex demo.

Details

  • ^ - start of a string (if you use it in an environment with the multiline flag enabled by default, try prepending it with (?-m) or use ^(?<![\s\S]))
  • ##(?!#) - a ## substring not followed with another #
  • .* - the rest of the line
  • (?:\n(?!##).*)* - zero or more lines not starting with ## not followed with another #.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    This has the advantage that the final newline characters are not included, assuming that is actually desirable. – Booboo Nov 25 '20 at 14:01
2

If I've understood the problem correctly, then

^## (\d+\.?){3}.+?(?=## \d)

should work. Here's a demo.

The pattern does the following:

  • Looks for two hashes followed by a version number, right at the start of the string - ^## (\d+\.?){3}.
  • Looks ahead for another two hashes followed by a number (start of the next set of notes) - (?=## \d)
  • Grabs all the characters in between the two, aiming for as few characters as possible - .+?

To make this work, you need the dotall flag enabled, so . can match newline characters.

Peritract
  • 761
  • 5
  • 13
  • 1
    Note your regex [will match](https://regexr.com/5h0oc) across more sections if there are any. [My solution](https://regex101.com/r/IleRL8/3) will still correctly match the first section. – Wiktor Stribiżew Nov 25 '20 at 11:24
  • Good catch, thank you - I've updated mine to be less greedy. – Peritract Nov 25 '20 at 11:30
  • 1
    There are still two things: 1) `## \d` matches anywhere on a line, not only at the start, which again might be unwelcome, 2) [your regex](https://regex101.com/r/2xaPVg/1) (499) takes a double amount more steps than [mine](https://regex101.com/r/IleRL8/2) (123) due to the [unroll-the-loop principle](https://stackoverflow.com/a/38018490/3832970) used in my solution. – Wiktor Stribiżew Nov 25 '20 at 11:36
  • Thank you again - you're absolutely right. I can fix the line-start thing easily, but that makes mine even less efficient. I'll read through the unrolling principle link. Thanks once more. – Peritract Nov 25 '20 at 11:42