457

I need to extract from a string a set of characters which are included between two delimiters, without returning the delimiters themselves.

A simple example should be helpful:

Target: extract the substring between square brackets, without returning the brackets themselves.

Base string: This is a test string [more or less]

If I use the following reg. ex.

\[.*?\]

The match is [more or less]. I need to get only more or less (without the brackets).

Is it possible to do it?

Tim
  • 41,901
  • 18
  • 127
  • 145
Diego
  • 7,312
  • 5
  • 31
  • 38

13 Answers13

709

Easy done:

(?<=\[)(.*?)(?=\])

Technically that's using lookaheads and lookbehinds. See Lookahead and Lookbehind Zero-Width Assertions. The pattern consists of:

  • is preceded by a [ that is not captured (lookbehind);
  • a non-greedy captured group. It's non-greedy to stop at the first ]; and
  • is followed by a ] that is not captured (lookahead).

Alternatively you can just capture what's between the square brackets:

\[(.*?)\]

and return the first captured group instead of the entire match.

jottr
  • 3,256
  • 3
  • 29
  • 35
cletus
  • 616,129
  • 168
  • 910
  • 942
  • 248
    "Easy done", LOL! :) Regular expressions always give me headache, I tend to forget them as soon as I find the ones that solve my problems. About your solutions: the first works as expected, the second doesn't, it keeps including the brackets. I'm using C#, maybe the RegEx object has its own "flavour" of regex engine... – Diego Sep 21 '09 at 15:15
  • 6
    It's doing that because you're looking at the whole match rather than the first matched group. – cletus Sep 21 '09 at 15:35
  • 2
    Does this work if the substring also contains the delimiters? For example in `This is a test string [more [or] less]` would this return `more [or] less` ? – gnzlbg Feb 22 '13 at 18:49
  • 1
    @gnzlbg no, it would return "more [or" – MerickOWA Jul 10 '13 at 21:32
  • This is returning the string along with the begin and end string – rajibdotnet Jan 30 '14 at 22:06
  • I needed something like this to find where class="border" was used in my code.. but even if preceded or followed by other classes. This helped get me where I needed, thanks! – SgtPooki Jul 23 '14 at 21:39
  • `Invalid regular expression: /(?<=[)(.*?)(?=])/: Invalid group` – Yeats Dec 02 '16 at 05:08
  • @cletus if you have multiple placeholders, how would you get all of them in one shot, or does one have to iterate over the string using this regex? `while(found = regex.exec("This is a [test] string [more or less]"))` – Legends Feb 17 '17 at 16:15
  • 1
    Be careful, lookbehinds are not supported by safari, so this can crash your app on safari. https://caniuse.com/?search=lookbehind – Maxstgt Jul 22 '21 at 15:21
78

If you are using JavaScript, the solution provided by cletus, (?<=\[)(.*?)(?=\]) won't work because JavaScript doesn't support the lookbehind operator.

Edit: actually, now (ES2018) it's possible to use the lookbehind operator. Just add / to define the regex string, like this:

var regex = /(?<=\[)(.*?)(?=\])/;

Old answer:

Solution:

var regex = /\[(.*?)\]/;
var strToMatch = "This is a test string [more or less]";
var matched = regex.exec(strToMatch);

It will return:

["[more or less]", "more or less"]

So, what you need is the second value. Use:

var matched = regex.exec(strToMatch)[1];

To return:

"more or less"
Zanon
  • 29,231
  • 20
  • 113
  • 126
  • 6
    what if there are multiple matches of [more or less] in the string? –  Feb 18 '19 at 06:09
  • Lookbehind assertions have been [**added to RegExp in ES2018**](http://2ality.com/2017/05/regexp-lookbehind-assertions.html) – Chunky Chunk May 23 '19 at 17:12
26

You just need to 'capture' the bit between the brackets.

\[(.*?)\]

To capture you put it inside parentheses. You do not say which language this is using. In Perl for example, you would access this using the $1 variable.

my $string ='This is the match [more or less]';
$string =~ /\[(.*?)\]/;
print "match:$1\n";

Other languages will have different mechanisms. C#, for example, uses the Match collection class, I believe.

cletus
  • 616,129
  • 168
  • 910
  • 942
Xetius
  • 44,755
  • 24
  • 88
  • 123
  • Thanks, but this solution didn't work, it keeps including the square brackets. As I wrote in my comment to Cletus' solution, it could be that C# RegEx object interprets it differently. I'm not expert on C# though, so it's just a conjecture, maybe it's just my lack of knowledge. :) – Diego Sep 21 '09 at 15:17
26

Here's a general example with obvious delimiters (X and Y):

(?<=X)(.*?)(?=Y)

Here it's used to find the string between X and Y. Rubular example here, or see image:

enter image description here

stevec
  • 41,291
  • 27
  • 223
  • 311
19

[^\[] Match any character that is not [.

+ Match 1 or more of the anything that is not [. Creates groups of these matches.

(?=\]) Positive lookahead ]. Matches a group ending with ] without including it in the result.

Done.

[^\[]+(?=\])

Proof.

http://regexr.com/3gobr

Similar to the solution proposed by null. But the additional \] is not required. As an additional note, it appears \ is not required to escape the [ after the ^. For readability, I would leave it in.

Does not work in the situation in which the delimiters are identical. "more or less" for example.

Stieneee
  • 191
  • 1
  • 3
  • 1
    This is a good solution, however I have made a tweak so that it ignores an extra ']' at the end as well: `[^\[\]]+(?=\])` – SteveEng Feb 18 '21 at 19:01
9

Most updated solution

If you are using Javascript, the best solution that I came up with is using match instead of exec method. Then, iterate matches and remove the delimiters with the result of the first group using $1

const text = "This is a test string [more or less], [more] and [less]";
const regex = /\[(.*?)\]/gi;
const resultMatchGroup = text.match(regex); // [ '[more or less]', '[more]', '[less]' ]
const desiredRes = resultMatchGroup.map(match => match.replace(regex, "$1"))
console.log("desiredRes", desiredRes); // [ 'more or less', 'more', 'less' ]

As you can see, this is useful for multiple delimiters in the text as well

Luis Febro
  • 1,733
  • 1
  • 16
  • 21
8

PHP:

$string ='This is the match [more or less]';
preg_match('#\[(.*)\]#', $string, $match);
var_dump($match[1]);
realloc
  • 157
  • 6
powtac
  • 40,542
  • 28
  • 115
  • 170
6

This one specifically works for javascript's regular expression parser /[^[\]]+(?=])/g

just run this in the console

var regex = /[^[\]]+(?=])/g;
var str = "This is a test string [more or less]";
var match = regex.exec(str);
match;
null
  • 71
  • 1
  • 6
5

To remove also the [] use:

\[.+\]
Cătălin Rădoi
  • 1,804
  • 23
  • 43
4

I had the same problem using regex with bash scripting. I used a 2-step solution using pipes with grep -o applying

 '\[(.*?)\]'  

first, then

'\b.*\b'

Obviously not as efficient at the other answers, but an alternative.

A. Jesús
  • 83
  • 4
4

I wanted to find a string between / and #, but # is sometimes optional. Here is the regex I use:

  (?<=\/)([^#]+)(?=#*)
techguy2000
  • 4,861
  • 6
  • 32
  • 48
2

Here is how I got without '[' and ']' in C#:

var text = "This is a test string [more or less]";

// Getting only string between '[' and ']'
Regex regex = new Regex(@"\[(.+?)\]");
var matchGroups = regex.Matches(text);

for (int i = 0; i < matchGroups.Count; i++)
{
    Console.WriteLine(matchGroups[i].Groups[1]);
}

The output is:

more or less
Audwin Oyong
  • 2,247
  • 3
  • 15
  • 32
Jamaxack
  • 2,400
  • 2
  • 24
  • 42
-1

If you need extract the text without the brackets, you can use bash awk

echo " [hola mundo] " | awk -F'[][]' '{print $2}'

result:

hola mundo

Nico
  • 858
  • 10
  • 20