0

I have a python regex like this:

re.compile(r'(\[chartsjs\].*\[/chartsjs\])', re.DOTALL)

I am trying to do a re.findall on patterns like this:

[charts]
name: mychart
type: line
labels: fish, cat, dog
data: 4, 5, 6
data2:5, 7, 9
[/charts]

this is some text

[charts]
name: second
type: line
labels: 100, 500, 1000
data: 50, 100, 10000
data2: 100, 100, 100
[/charts]

But it seems like it is matching the first [charts] to the very last [charts] and grabbing everything in the middle, because when I print it to the console I am seeing this:

[u'[chartsjs]\r\nname: mychart\r\ntype: line\r\nlabels: fish, cat, dog\r\ndata: 4, 5, 6\r\ndata2:5, 7, 9\r\n[/chartsjs]\r\n\r\nthis is some text now fool\r\n\r\n[chartsjs]\r\nname: second\r\ntype: line\r\nlabels: 100, 500, 1000\r\ndata: 50, 100, 10000\r\ndata2: 100, 100, 100\r\n[/chartsjs]']

I would like the regex to return the first match, eliminate the arbitrary test and then find another arbitrary number of matches. Is there a way to do this?

dda
  • 6,030
  • 2
  • 25
  • 34
Joff
  • 11,247
  • 16
  • 60
  • 103

2 Answers2

7

You have got just 1 problem in your regex.

.* will greedily match everything in its path. When it encounters the first closing [/charts] it will go further to check if there are any more [/charts] ahead. If found then it will proceed.

To make it stop at first [/charts] we need to make it lazy by putting a question mark. .*? This will keep matching everything and stops at first [/charts]

Take a look I tested it:

import re

a="""
[charts]
name: mychart
type: line
labels: fish, cat, dog
data: 4, 5, 6
data2:5, 7, 9
[/charts]

this is some text

[charts]
name: second
type: line
labels: 100, 500, 1000
data: 50, 100, 10000
data2: 100, 100, 100
[/charts]
"""

for c in re.findall('(\[charts\].*?\[/charts\])',a, re.DOTALL):
    print c

Output:

[charts]
name: mychart
type: line
labels: fish, cat, dog
data: 4, 5, 6
data2:5, 7, 9
[/charts]
[charts]
name: second
type: line
labels: 100, 500, 1000
data: 50, 100, 10000
data2: 100, 100, 100
[/charts]
Mohammad Yusuf
  • 16,554
  • 10
  • 50
  • 78
1

The main thing here is you want the .* to be a .*?. There are other ways to optimize the regex, as others have answered, but I think the root of your question is you want to match everything UNTIL you see that [/charts] pattern, which ? will give you.

Brian Mego
  • 1,439
  • 11
  • 10
  • Is the other way to optimize using \s\S flags? I'm curious about what else can make it better – Joff Dec 23 '16 at 16:09
  • If possible, being specific is better than using the . in Regex. You can get into very time consuming backtracking on large strings if you use . alone. \s\S isn't really any different than just using ., so I retract the "as others have answered bit", but depending on your particular use case in the future, it's good knowledge to have – Brian Mego Dec 23 '16 at 16:32