0

I am implementing a method that takes in a regex pattern like r'(\w+/)+end' and a string 'ab/cd/ef/end'. Note that I cannot request the caller of the method to update their pattern format. Within the method, I needs to perform an operation that requires extracting all matches of the first capturing group i.e. ab/, cd/, and ef/.

How do I accomplish this in Python? Something like below returns a tuple of last-matches for each of capturing groups. We have just one in this example, so it returns ('ef/',).

re.match(r'(\w+/)+end', 'ab/cd/ef/end').groups()

By the way, in C#, every capturing group can match multiple strings e.g. Regex.Match("ab/cd/ef/end", @"(\w+/)+end").Groups[1].Captures will return all the three matches for first capturing group (\w+/)+.

Jatin Sanghvi
  • 1,928
  • 1
  • 22
  • 33
  • Cannot accept the answers. I already mentioned that it's not possible to change the regex. Extracting pattern-text within capturing groups parentheses could be error prone. I will go ahead with using regex project (https://pypi.org/project/regex/). It nicely solves my problem: `regex.match(r'(\w+/)+end', 'ab/cd/ef/end').captures(1)` returns `['ab/', 'cd/', 'ef/']`. – Jatin Sanghvi Aug 23 '19 at 02:29

2 Answers2

0

If you just want to capture all path names which are followed by a separator, then use the pattern \w+/ with re.findall:

inp = "ab/cd/ef/end"
matches = re.findall(r'\w+/', inp)
print(matches)

['ab/', 'cd/', 'ef/']

If instead you want all path components, whether or not they be preceded by a path separator, then we can try:

inp = "ab/cd/ef/end"
matches = re.findall(r'[^/]+', inp)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0
r = r"(\w+/)(?<!end)"
s = "ab/cd/ef/end"

m = re.finditer(r, s, re.MULTILINE)

for g in m:
    print(g.group())

Example:

https://regex101.com/r/VJ6knI/1

l'L'l
  • 44,951
  • 10
  • 95
  • 146