0

I have this regex in Python

import re
regex = '(?P<name>[\w\-+\.&\s]+)[\s\-]+(?P<form>GmbH & Co\. KG|KG)'
print(re.match(regex, "Test GmbH & Co. KG").groupdict())

This will return

{'name': 'Test GmbH & Co.', 'form': 'KG'}

But I'd like

{'name': 'Test', 'form': ' GmbH & Co. KG'}

I am thinking about making the first capture group non-matching if there is a match in the second group. Another idea was to somehow instruct the regex engine to start at the end. Also fiddled around with greedy and lazy modifiers. But I am a noob with regexes and could really need a hint.

Jabb
  • 3,414
  • 8
  • 35
  • 58
  • That's not how regexes work though. You have a greedy match, so it takes the *longest valid match*. Use a non-greedy match if you want the regex to match the *minimum* number of characters instead. – Martijn Pieters Feb 02 '20 at 21:10
  • You might want to read the [Python regex HOWTO](https://docs.python.org/3/howto/regex.html), which has a [section on greedy vs. non-greedy](https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy). You also want to read the [*backslash plague* section](https://docs.python.org/3/howto/regex.html#the-backslash-plague), as your `regex` string value doesn't have as many backslashes as you think it has. – Martijn Pieters Feb 02 '20 at 21:13
  • `(?P[\w\-+\.&\s]+)` => `(?P[\w\-+\.&\s]+?)`, [demo](https://regex101.com/r/iEk8ie/1) – Wiktor Stribiżew Feb 02 '20 at 21:13
  • Try out [regex101 with the corrected regex](https://regex101.com/r/hUBWd5/1) to see how the non-greedy `?` modifier fixes your pattern. – Martijn Pieters Feb 02 '20 at 21:15
  • Wiktor & Martijn thanks a lot! – Jabb Feb 02 '20 at 21:24

0 Answers0