0

Is there an regexp I can use to detect from the input:

Nikolai, Gogol, Johann Wolfgang von Goethe und Fritze Hanschmann

the expected output:

Nikolai, Gogol
Johann Wolfgang von Goethe
Fritze Hanschmann

The goal is to elimnate the word "und" and that I get an array of the names.

Edit: The first two words with the comma always belongs together.

Edit2: This is my current approach:

/((\w+),(\s)(\w)+\,\s)(.*)/g

How can I detect and replace the remaining groups and also the word "und" ?

Stefan
  • 359
  • 3
  • 17
  • 1
    Your expected output is a bit unclear. Is `Nikolai, Gogol` supposed to be one element of the array? If so, how can its comma be differed from the commas that separate names? Also, have you attempted to write a regular expression of your own? – El_Vanja Jan 05 '21 at 13:38
  • Edit: The first two words with the comma always belongs together. I have attempted to write an regexp on my own, but I'm very bad in such things. – Stefan Jan 05 '21 at 13:45
  • 1
    So `Nikolai, Gogol, Werner, Werther und Hans Fritz` should return `Nikolai, Gogol`, `Werner, Werther` and `Hans Fritz`? – El_Vanja Jan 05 '21 at 13:54
  • Yes, that is correct! – Stefan Jan 05 '21 at 14:00
  • It looks like you are looking to create a regex, but do not know where to get started. Please check [Reference - What does this regex mean](https://stackoverflow.com/questions/22937618) resource, it has plenty of hints. Also, refer to [Learning Regular Expressions](https://stackoverflow.com/questions/4736) post for some basic regex info. Once you get some expression ready and still have issues with the solution, please edit the question with the latest details and we'll be glad to help you fix the problem. – Wiktor Stribiżew Jan 05 '21 at 14:05
  • I've added my current approach. – Stefan Jan 05 '21 at 14:41
  • If input is always formatted like your sample, eg try: [`\w+,\s*\w+|\b(?!und)\w.*?(?=\sund|$)`](https://regex101.com/r/up9xpY/2) – bobble bubble Jan 05 '21 at 19:49
  • @Stefan Is that format always like this? Or can there be multiple occurrences with und and can there be multiple names with a comma in between also not at the start? You could for example spit on `\w+,\h*\w+\K,\h*|\h*\bund\b\h*` see https://regex101.com/r/cSxM5j/1 – The fourth bird Jan 14 '21 at 21:57

1 Answers1

1

If you are guaranteed to have at least two items separated by "und", this works:

(.+, .+)(?:, ([^,]+))* und (.*)

The first capture group must have a comma separating any two strings. After that, we can have zero or more strings beginning with a comma. Finally we have "und" then one more string.

Byron
  • 309
  • 1
  • 8