0

I have a piece of text like this:

[{1,2,3,4}, 3, 5,2,4, {1,2}, {1,2,3,4}, {1,33,3443}, 1..10]

here the numbers within curly braces {} are to be considered a single atom. Finally I want to split the text to individual elements in array. So finally the piece of text after some operation need to become a list like this

expected = ['{1,2,3,4}', '3', '5', '2', '4', '{1,2}', '{1,2,3,4}', '{1,33,3443}', '1..10']

with each element as separate strings.

I am not able to figure out a good way to split. I can take the string as an array and iterate through it and substitute , only within {} with some other delimiter and use a split function to split on , to get what I want. But I was wondering if it is possible via regular expression by applying substitution on portion of text that matches some pattern. I was trying to do it by doing something like this.

line='[{1,2,3,4}, 3, 5,2,4, {1,2}, {1,2,3,4}, {1,33,3443}]'
# I hope the comma in {1,2,3,4} are substituted by : and i get {1:2:3:4}
# on which i can do a re.split or just split to get elements in form i want
      # find text within {}       on the text found, replace ',' with ':'
re.sub(r'(?P<set_value>\{.*?\})', re.sub(r',',':', '\g<1>'), line)  

when i run the above code, i get the original line itself, without any change

'[{1,2,3,4}, 3, 5,2,4, {1,2}, {1,2,3,4}, {1,33,3443}]'

Is there a way I can fix the expression get right answer?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Gautam
  • 375
  • 2
  • 6
  • 23
  • Your substitution string is itself a `re.sub()` call on `'\g<1>`? Why? – Martijn Pieters Jun 19 '18 at 13:52
  • @MartijnPieters I was hoping `\g<1>` would match one the text with curly braces, and I could only perform substitution on the matched text and return it as a replacement – Gautam Jun 19 '18 at 13:54
  • You nested a `re.sub()` call, which just returns `'\g<1>'`, so that call is not going to do anything useful. Next, `re.sub(pattern, '\g<1>', input)` replaces all matches with the contents of group 1. Since your whole pattern only consists of group 1, you are substituting group 1 matches by itself, resulting in no changes. – Martijn Pieters Jun 19 '18 at 13:56
  • @MartijnPieters `re.sub(r',',':', '\g<1>')` , i was thinking it would substitute `,` in `\g<1>` with `:` and return the replacement string? – Gautam Jun 19 '18 at 13:58
  • See the duplicate, this is essentially the same problem: `re.split(r',\s*(?![^{}]*})', line)`, produces `['[{1,2,3,4}', '3', '5', '2', '4', '{1,2}', '{1,2,3,4}', '{1,33,3443}]']`. – Martijn Pieters Jun 19 '18 at 13:58
  • There is no `,` in `'\g<1>'`; the *result* of that inner `re.sub()` is passed to the outer function call, not the group contents. You'd have to use callable object there (so a function object. Not that that'll work here. – Martijn Pieters Jun 19 '18 at 13:59
  • @MartijnPieters. I think i understand. `'\g<1>'` is just piece of text, only has meaning when actually match happens. Thanks for pointing me to the original question. – Gautam Jun 19 '18 at 14:17

0 Answers0