How to perform substitution only on piece of text that matches pattern

Question

I have a piece of text like this:

[{1,2,3,4}, 3, 5,2,4, {1,2}, {1,2,3,4}, {1,33,3443}, 1..10]

here the numbers within curly braces {} are to be considered a single atom. Finally I want to split the text to individual elements in array. So finally the piece of text after some operation need to become a list like this

expected = ['{1,2,3,4}', '3', '5', '2', '4', '{1,2}', '{1,2,3,4}', '{1,33,3443}', '1..10']

with each element as separate strings.

I am not able to figure out a good way to split. I can take the string as an array and iterate through it and substitute , only within {} with some other delimiter and use a split function to split on , to get what I want. But I was wondering if it is possible via regular expression by applying substitution on portion of text that matches some pattern. I was trying to do it by doing something like this.

line='[{1,2,3,4}, 3, 5,2,4, {1,2}, {1,2,3,4}, {1,33,3443}]'
# I hope the comma in {1,2,3,4} are substituted by : and i get {1:2:3:4}
# on which i can do a re.split or just split to get elements in form i want
      # find text within {}       on the text found, replace ',' with ':'
re.sub(r'(?P<set_value>\{.*?\})', re.sub(r',',':', '\g<1>'), line)

when i run the above code, i get the original line itself, without any change

'[{1,2,3,4}, 3, 5,2,4, {1,2}, {1,2,3,4}, {1,33,3443}]'

Is there a way I can fix the expression get right answer?

Your substitution string is itself a `re.sub()` call on `'\g<1>`? Why? — Martijn Pieters, Jun 19 '18 at 13:52
@MartijnPieters I was hoping `\g<1>` would match one the text with curly braces, and I could only perform substitution on the matched text and return it as a replacement — Gautam, Jun 19 '18 at 13:54
You nested a `re.sub()` call, which just returns `'\g<1>'`, so that call is not going to do anything useful. Next, `re.sub(pattern, '\g<1>', input)` replaces all matches with the contents of group 1. Since your whole pattern only consists of group 1, you are substituting group 1 matches by itself, resulting in no changes. — Martijn Pieters, Jun 19 '18 at 13:56
@MartijnPieters `re.sub(r',',':', '\g<1>')` , i was thinking it would substitute `,` in `\g<1>` with `:` and return the replacement string? — Gautam, Jun 19 '18 at 13:58
See the duplicate, this is essentially the same problem: `re.split(r',\s*(?![^{}]*})', line)`, produces `['[{1,2,3,4}', '3', '5', '2', '4', '{1,2}', '{1,2,3,4}', '{1,33,3443}]']`. — Martijn Pieters, Jun 19 '18 at 13:58
There is no `,` in `'\g<1>'`; the *result* of that inner `re.sub()` is passed to the outer function call, not the group contents. You'd have to use callable object there (so a function object. Not that that'll work here. — Martijn Pieters, Jun 19 '18 at 13:59
@MartijnPieters. I think i understand. `'\g<1>'` is just piece of text, only has meaning when actually match happens. Thanks for pointing me to the original question. — Gautam, Jun 19 '18 at 14:17

How to perform substitution only on piece of text that matches pattern

0 Answers0