I'm developing a sort of "parser for a custom script" in python using regexps. Please don't answer about if regexp is a good solution of not for this kind of operation... It is long (and off-topic) to explain why I'm choosing to use regexp, even if I know the problems of using regexp for parsing.
Now I proceed with the question. We start with this scenario:
This is the line I will read from a file, that I need to parse with my regexp:
something = { call _ "string to ""capture"" " } #non consumed
now I can do something like this:
import re
regex1 = re.compile(r'^([^"]*?)(_?)\s*"((?:""|[^"])*)"')
mystr = r'something = { call _ "string to ""capture"" " } #non consumed'
mymatch = re.search(regex1, mystr)
so I can obtain those capture groups:
- 0: all mystr line until the last quote
- 1: all things before quotation (I need this match to verify a thing later)
- 2: '_' or '' (depending if there is an underscore rightly before quotation [there can be spaces between underscore and quotation])
- 3: quotation (where "" is considered as a character and not a closing quote)
I need to know those groups, so using re.search
is fine (becouse I can use mymatch.group(n)
to check the value of the single captured groups).
But... after I used all groups from 1
to 3
, I will need to reduce mystr so it will contain only "non consumed string by the 'successfull' regexp"
I could do this with:
mystr = mystr[ len(mymatch.group(0)): ]
so a working code could be this one:
import re
regex1 = re.compile(r'^([^"]*?)(_?)\s*"((?:""|[^"])*)"')
mystr = r'something = { call _ "string to ""capture"" " } #non consumed'
mymatch = re.search(regex1, mystr)
# code here that uses mymatch.group(n)
mystr = mystr[ len(mymatch.group(0)): ] # clear from mystr what was parsed by the regexp
but I'd like to see if there are other ways to do this. Can you suggest other code approaches different by the one I provided?"
Searches:
Not useful: it ask only about replacing, but not about single match groups. Here I am asking how to do both actions together in a good way
Not useful: For (almost) the same reason as the first link