i am facing issues while trying to cut out a substring from a string using python regex. the problem statement is that i want to take any substring matching the following format from a bigger string
some_var:struct<some_variables>
In doing so, i got into three corner case scenarios and let me explain those scenarios in details
Scenario1 :-
s='firstname:string,middlename:double,lastname:struct<last1:int,last2:array<string>>,addr:string'
match = re.search(r'\w[a-zA-Z]*:struct<.*>,',s)
>>> print(match.group())
lastname:struct<last1:int,last2:array<string>>,
the above code works fine.
Scenario2:-
subdtyp = 'firstname:string,middlename:double,lastname:struct<last1:int,last2:array<string>>,last3:array<string>,last4:struct<last41:int,last42:string>'
>>> match = re.search(r'\w[a-zA-Z]*:struct<.*>,',subdtyp)
>>> print(match.group())
lastname:struct<last1:int,last2:array<string>>,last3:array<string>,
in this case on using the above regex format, due to greedy matching i am getting a string which is more than what is expected (last3:array<string>,
) is the extra bit of information that is coming. So i changed that to non-greedy matching like below
>>> match = re.search(r'\w[a-zA-Z]*:struct<.*?>,',subdtyp)
>>> print(match.group())
lastname:struct<last1:int,last2:array<string>>,
this time the result is coming fine and what i want
Scenario 3 :-
subdtyp2 = 'firstname:string,middlename:double,lastname:struct<last4:struct<last41:int,last42:string>,last2:array<string>>,last3:array<string>'
>>> match = re.search(r'\w[a-zA-Z]*:struct<.*?>,',subdtyp2)
>>> print(match.group())
lastname:struct<last4:struct<last41:int,last42:string>,
here we are not getting the completed result as (last2:array<string>
) portion is missed out for non-greedy matching.
Can somebody please help me in providing me a regex which will satisfy all the above conditions ?