2

I have the following kind of string "sometext1 §§ 12 Abs. 5, 13a, 14 Satz 1 Nr. 3, 9, 8 sometext2". I want to find a §§ substring and all consecutive occurrences of Abs., und, Satz and Nr. as well as digits with a single character like 13a.

Examples:

"Die Anzahl der §§ 12 Abs. 5, 13a, 14 Satz 1 und 8 kann variieren. Für die §§ 15a, 18 Abs. 5, 21 und 23 Satz 3 trifft dies nicht zu.

Here I want to get 12 Abs. 5, 13a, 14 Satz 1 und 8 and 15a, 18 Abs. 5, 21 und 23 Satz 3.

I used the following regex 'r'§§ (.*)? ^(?!Satz|Abs.|Nr.|\d+[a-z]| |,)'.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Mazze
  • 383
  • 3
  • 13
  • `sometext2` can vary and is not always `sometext2`. It might be `dfsdf` – Mazze Feb 18 '22 at 08:00
  • You are right. it finds too many matches – Mazze Feb 18 '22 at 08:14
  • 1
    While you are still thinking about the requirements, try `§§\s*((?:Satz|Abs\.|Nr\.|\d+[a-z]?|und|[\s,])+)(?<=\w)`, see [this regex demo](https://regex101.com/r/3DxveT/1). Note I included `und` here and made the letter after number optional. If this logic of "whitelisting" words in the match is working, this can be a solution. – Wiktor Stribiżew Feb 18 '22 at 08:17
  • 1
    Thanks a lot, that seems to solve my problem and meet all the requirements – Mazze Feb 18 '22 at 08:21

1 Answers1

2

You can use

§§\s*((?:Satz|Abs\.|Nr\.|\d+[a-z]?|und|[\s,])+)(?<=\w)

See the regex demo. Details:

  • §§ - a literal text
  • \s* - zero or more whitespaces
  • ((?:Satz|Abs\.|Nr\.|\d+[a-z]?|und|[\s,])+) - Group 1 capturing one or more occurrences of Satz, Abs., Nr., one or more digits optionally followed by a lowercase ASCII letter, und, whitespace or comma.
  • (?<=\w) - the char immediately on the left must be a word char.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563