2

I have a text file consisting of space-separate text values:

a: b c d e f g
h: i j k
l:
m: n

I do not know how many of these values - right of ;- I'll have a priori.

I want to use Python groups within a regular expression to be able to refer to each capture.

GnuATgtRE = re.compile(br'^\r\n(?P<target>.+): (?P<deps>.*)\r\n# Implicit rule search has', re.MULTILINE)

Currently, <target> references the item to the left of semi-colon and <deps> references everything, in one string, to the right.

I do not know a priori how many deps each target will have.

The syntax (?P<text>) is used to create a group which can be used to reference a specific captured sub-regex.

For example, for line 1

match_obj.group('target') = a match_obj.group('deps') = b c d e f g

Line 2:

match_obj.group('target') = h match_obj.group('deps') = i j k

Question

After I execute match = GnuATgtRE.search(string), I want to be able to be able to reference each space-separate dep via match.group('some_text').

The problem is that I don't know if there is a way to create an arbitrary number of unnamed groups.

For line 1, I'd like to be able to say match.group('<5>') and have that return d.

For line 2, match.group('<5') should return `` since there's only 5 letters.

Bob
  • 4,576
  • 7
  • 39
  • 107
  • Is it really an arbitrary number, or is it up to some limit (e.g., 10 or 100)? Also, you may be better off with [`pyparsing`](http://pyparsing.wikispaces.com/) or another more powerful parser here. – cxw Jul 22 '16 at 18:13
  • Why can't you just split group 2 on space? – Rohit Jain Jul 22 '16 at 18:14
  • @cxw it's pretty arbitrary. This is from the output of `make`. To be clear, I know I could just do `matchObj.group('deps').split()` . . . hmmm. I think I just solved my problem. – Bob Jul 22 '16 at 18:14
  • I would just go with `target, deplist = line.strip().split(':')` followed by `deps = deplist.strip().split()`... No reason to unnecessarily complicate it with regexps... – twalberg Jul 22 '16 at 18:46
  • ...Why the heck are you explicitly using a binary string for text data? – jpmc26 Jul 22 '16 at 19:44
  • 1
    I'm using `mmap` which can only do regex searches with byte array strings. – Bob Jul 23 '16 at 00:58

1 Answers1

2

See this answer.

Most or all regular expression engines in common use, including in particular those based on the PCRE syntax (like Python's), label their capturing groups according to the numerical index of the opening parenthesis, as the regex is written. So no, you cannot use capturing groups alone to extract an arbitrary, variable number of subsequences from a string.

A better solution is to just call line.split() on everything after the x: on a line.

Community
  • 1
  • 1
Trevor Merrifield
  • 4,541
  • 2
  • 21
  • 24