1

Considering I have below string,

~m~90~m~{Somestringof length 90}~m~20~m~{Some string of len 20}~m~10~m~{Somestringof length 10}

Need to extract the strings between the pattern ~m~\d~m~ one by one

One solution I found is something like

my_pattern = re.compile("~m~\d+~m~")
data_arr = my_pattern.finditer(result)
if data_arr:
   get the indexes of pattern and search in original string

This solution is not optimal, I am sure there must be a better way to do it.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
unbesiegbar
  • 471
  • 2
  • 7
  • 19

1 Answers1

1

You could split on the ~m~, which will get you both the numbers and the strings!

>>> "~m~90~m~{Somestringof length 90}~m~20~m~{Some string of len 20}~m~10~m~{Somestringof length 10}".split("~m~")
['', '90', '{Somestringof length 90}', '20', '{Some string of len 20}', '10', '{Somestringof length 10}']

If you're certain you'll always have that same form, you can clean it up and take Pairs from single list (note itertools.izip is simply zip in Python3)

>>> src_text = "~m~90~m~{Somestringof length 90}~m~20~m~{Some string of len 20}~m~10~m~{Somestringof length 10}"
>>> def pairwise(t):
...     it = iter(t)
...     return zip(it,it)
...
>>> for textlen, text in pairwise(filter(None, src_text.split("~m~"))):
...   print(f"{textlen}: {text}")
...
90: {Somestringof length 90}
20: {Some string of len 20}
10: {Somestringof length 10}

Note that the results of .split()-ing will be strings, so you should explicitly convert the numeric length to be int() if you need it as one

ti7
  • 16,375
  • 6
  • 40
  • 68