5

In this question a regex for capturing a string between delimiters is provided:

Test: This is a test string [more or less]

Regexp: (?<=\[)(.*?)(?=\])

Returns: more or less

What if the string to be captured also contains delimiters?

Test 1: This is a test string [more [or] less]

Return 1: more [or] less

Test 2: This is a test string [more [or [and] or] less]

Return 2: more [or [and] or] less

And multiple brackets?

Test 3: This is a test string [more [or [and] or] less] and [less [or [and] or] more]

Return 3: more [or [and] or] less, less [or [and] or] more

Which regex would do this? Or which small ruby/python script can do this?

Community
  • 1
  • 1
gnzlbg
  • 7,135
  • 5
  • 53
  • 106
  • You are trying to tell regex to track when the bracket is "closed". I don't think it can do that. – Tengiz Feb 22 '13 at 18:58
  • Indeed. I asked it as a comment there but I guess it deserves a separate question. – gnzlbg Feb 22 '13 at 18:58
  • I would rather write a custom function, to count number of open brackets until it's closed (shifting index forward). But I wish I see the regexp here. – Tengiz Feb 22 '13 at 19:00

1 Answers1

7

In javascript

var str = 'This is a test string [more [or [and] or] less]';    
str = str.match( /\[(.+)\]/ )[1];
// "more [or [and] or] less"

If you omit the ?, the .+ will match greedily up to the last ].

In python

str = "This is a test string [more [or [and] or] less]"
re.search( "(?<=\[).+(?=\])", str ).group()
// "more [or [and] or] less"

Update for multiple nested brackets

In javascript

var matches = [],
    str = 'This is a test string [more [or [and] or] less] and [less [or [and] or] more] and [more]';

str.replace( /\[([^\]]*\[?[^\]]*\]?[^[]*)\]/g, function ( $0, $1 ) {
    $1 && matches.push( $1 );
});

console.log( matches );
// [ "more [or [and] or] less", "less [or [and] or] more", "more" ]

In python

import re
str = 'This is a test string [more [or [and] or] less] and [less [or [and] or] more] and [more]'

matches = re.findall( r'\[([^\]]*\[?[^\]]*\]?[^[]*)\]', str )

print matches
# [ 'more [or [and] or] less', 'less [or [and] or] more', 'more' ]
MikeM
  • 13,156
  • 2
  • 34
  • 47