609

Simple regex question. I have a string on the following format:

this is a [sample] string with [some] special words. [another one]

What is the regular expression to extract the words within the square brackets, ie.

sample
some
another one

Note: In my use case, brackets cannot be nested.

ObiWanKenobi
  • 14,526
  • 11
  • 45
  • 51

15 Answers15

1091

You can use the following regex globally:

\[(.*?)\]

Explanation:

  • \[ : [ is a meta char and needs to be escaped if you want to match it literally.
  • (.*?) : match everything in a non-greedy way and capture it.
  • \] : ] is a meta char and needs to be escaped if you want to match it literally.
codaddict
  • 445,704
  • 82
  • 492
  • 529
  • 13
    The other answer's method, using `[^]]` is faster than non-greedy (`?`), and also works with regex flavours that don't support non-greedy. However, non-greedy looks nicer. – Ipsquiggle Mar 08 '10 at 17:24
  • 252
    How to exclude `[` `]` from output(result)? – Mickey Tin Apr 28 '13 at 22:46
  • 14
    @MickeyTin, if you are using Java, you can group it using group(1) over just group(), so the '[]' will not go together – andolffer.joseph Sep 19 '13 at 16:47
  • Can you please provide 'sed' and 'awk' examples to use this regex and extract text. Thanks. – valentt Jul 17 '15 at 14:37
  • 27
    This matches only the first occurrence – hfatahi Aug 06 '15 at 14:43
  • 1
    This works, but will not include additional closing brackets as part of the first capture group. To do this, use `\[(.*?]*)\]`. This will match `[[[sample]]]` such that `[[[sample]]]` is the matched string and `[[sample]]` is the first capture group. `[sample]]]` will also match `[sample]]]` with group one being `sample]]`, and `[[[sample]` will match `[[[sample]` with group one being `[[sample` – driima Oct 17 '15 at 14:44
  • Example to match all in php. $pattern = '#\[(.*?)\]#'; $string = 'This is a [brack] and [snares]'; preg_match_all($pattern, $string, $matches); – user1502826 Nov 26 '15 at 16:52
  • What if nested bracket contents like this.. [sample of [abc] in cd] – sarath Apr 05 '16 at 11:40
  • Note: to match all the occurrences use the method 'scan' http://stackoverflow.com/questions/80357/match-all-occurrences-of-a-regex – Alessandro De Simone Jun 11 '16 at 21:48
  • What will be the regex for var string = "@input [ Square_bracket ]". Please Suggest –  Oct 17 '17 at 05:48
  • 20
    How do you exclude the brackets from the return? – jzadra Apr 04 '18 at 22:44
  • 1
    @MickeyTin in JS you can use `var stringWithoutBrackets = stringWithBrackets.replace(/[\[\]']+/g,'')` See [this answer from Mark](https://stackoverflow.com/a/3812077/9360197) – user2018 Oct 24 '18 at 11:22
  • https://stackoverflow.com/a/2403148/2260920 explains a solution where if there is whitespace like enter inside the brackets. – XPD May 20 '20 at 07:44
  • If I want only text in the blanket and erase the outsides, how can I do? – Bonn Jun 24 '20 at 08:22
  • @codaddict Man this is perfect, it saved me so much time! Thank you so much. – Ivan Yurchenko Oct 08 '20 at 12:51
  • This does not exclude the brackets from the matches. This is the correct expression to remove the brackets: (?<=\[)[^\[\]]*(?=\]) – Nikolai Feb 17 '22 at 19:09
  • @Nikolai That regex is invalid - it has an unterminated group – David Oct 06 '22 at 05:59
  • This will exclude brackets spaced across newlines. – Stevoisiak Nov 18 '22 at 19:13
216
(?<=\[).+?(?=\])

Will capture content without brackets

  • (?<=\[) - positive lookbehind for [

  • .*? - non greedy match for the content

  • (?=\]) - positive lookahead for ]

EDIT: for nested brackets the below regex should work:

(\[(?:\[??[^\[]*?\]))
wnull
  • 217
  • 6
  • 21
Adam Moszczyński
  • 3,477
  • 1
  • 17
  • 18
104

This should work out ok:

\[([^]]+)\]
jasonbar
  • 13,333
  • 4
  • 38
  • 46
  • 7
    In my use case, the bracketed text may include new lines, and this regex works, while the accepted answer does not. – Dave Jun 08 '13 at 04:59
  • 1
    what does the character class [^]] mean? What does it match? – Richard Sep 15 '13 at 13:25
  • 3
    @Richard, The ^ negates the character class. It means "any character that is not a ]". – jasonbar Sep 16 '13 at 12:46
  • 9
    I think it doesn't work as expected, you should use `\[([^\[\]]*)\]` to get the content in the most inner bracket. If you look into `lfjlksd [ded[ee]22]` then `\[([^]]+)\]` will get you `[ded[ee]` while the proposed expression would return `[ee]`. testede in [link](http://regexpal.com/) – TMC Apr 02 '14 at 14:45
  • 2
    Can you please provide 'sed' and 'awk' examples to use this regex and extract text. Thanks. – valentt Jul 17 '15 at 14:37
  • This is the most robust way for complex situations. – Hong Nov 25 '18 at 06:22
  • This works if the content inside the brackets has whitespace. – XPD May 20 '20 at 07:44
  • Works perfectly in Google Sheets. Thanks! – Paul Murray Nov 17 '21 at 14:04
40

Can brackets be nested?

If not: \[([^]]+)\] matches one item, including square brackets. Backreference \1 will contain the item to be match. If your regex flavor supports lookaround, use

(?<=\[)[^]]+(?=\])

This will only match the item inside brackets.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • 1
    @KunalMukherjee: No, the regex can match any number of times. But some regex flavors needs to be told explicitly to apply the regex repeatedly (for example, by using the `/g` flag in JavaScript). – Tim Pietzcker Dec 08 '17 at 15:30
31

To match a substring between the first [ and last ], you may use

\[.*\]            # Including open/close brackets
\[(.*)\]          # Excluding open/close brackets (using a capturing group)
(?<=\[).*(?=\])   # Excluding open/close brackets (using lookarounds)

See a regex demo and a regex demo #2.

Use the following expressions to match strings between the closest square brackets:

  • Including the brackets:

  • \[[^][]*] - PCRE, Python re/regex, .NET, Golang, POSIX (grep, sed, bash)

  • \[[^\][]*] - ECMAScript (JavaScript, C++ std::regex, VBA RegExp)

  • \[[^\]\[]*] - Java, ICU regex

  • \[[^\]\[]*\] - Onigmo (Ruby, requires escaping of brackets everywhere)

  • Excluding the brackets:

  • (?<=\[)[^][]*(?=]) - PCRE, Python re/regex, .NET (C#, etc.), JGSoft Software

  • \[([^][]*)] - Bash, Golang - capture the contents between the square brackets with a pair of unescaped parentheses, also see below

  • \[([^\][]*)] - JavaScript, C++ std::regex, VBA RegExp

  • (?<=\[)[^\]\[]*(?=]) - Java regex, ICU (R stringr)

  • (?<=\[)[^\]\[]*(?=\]) - Onigmo (Ruby, requires escaping of brackets everywhere)

NOTE: * matches 0 or more characters, use + to match 1 or more to avoid empty string matches in the resulting list/array.

Whenever both lookaround support is available, the above solutions rely on them to exclude the leading/trailing open/close bracket. Otherwise, rely on capturing groups (links to most common solutions in some languages have been provided).

If you need to match nested parentheses, you may see the solutions in the Regular expression to match balanced parentheses thread and replace the round brackets with the square ones to get the necessary functionality. You should use capturing groups to access the contents with open/close bracket excluded:

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • This `\[((?>[^][]+|(?)\[|(?<-o>]))*)]` was 99.9% what I needed. By that, I mean I need everything inside the outermost brackets, but not the brackets themselves. IE, in your .Net demo link, it matches all of [text [2]], and I'd like the match to return "text [2]". However, I can get around that by just taking the match and doing a simple substring that skips the first and last characters. I am curious if it is possible to modify that regex ever so slightly to automatically omit the outermost brackets. – B.O.B. Jan 16 '22 at 21:16
  • 1
    @B.O.B. You need to get the Group 1 value, see [the C# demo online](https://ideone.com/fKwvMo). – Wiktor Stribiżew Jan 16 '22 at 21:21
  • Thanks! I'll give that I try in my demo code that I am using (before I move it into the real project). Edit: that was exactly it! Thanks for the expert and exceptionally fast response). – B.O.B. Jan 16 '22 at 21:22
25

If you do not want to include the brackets in the match, here's the regex: (?<=\[).*?(?=\])

Let's break it down

The . matches any character except for line terminators. The ?= is a positive lookahead. A positive lookahead finds a string when a certain string comes after it. The ?<= is a positive lookbehind. A positive lookbehind finds a string when a certain string precedes it. To quote this,

Look ahead positive (?=)

Find expression A where expression B follows:

A(?=B)

Look behind positive (?<=)

Find expression A where expression B precedes:

(?<=B)A

The Alternative

If your regex engine does not support lookaheads and lookbehinds, then you can use the regex \[(.*?)\] to capture the innards of the brackets in a group and then you can manipulate the group as necessary.

How does this regex work?

The parentheses capture the characters in a group. The .*? gets all of the characters between the brackets (except for line terminators, unless you have the s flag enabled) in a way that is not greedy.

LJ Germain
  • 467
  • 1
  • 6
  • 14
  • Fyi, this breaks Safari if the version is < 16.4 - https://caniuse.com/js-regexp-lookbehind – Tonni Apr 14 '23 at 03:19
20

Just in case, you might have had unbalanced brackets, you can likely design some expression with recursion similar to,

\[(([^\]\[]+)|(?R))*+\]

which of course, it would relate to the language or RegEx engine that you might be using.

RegEx Demo 1


Other than that,

\[([^\]\[\r\n]*)\]

RegEx Demo 2

or,

(?<=\[)[^\]\[\r\n]*(?=\])

RegEx Demo 3

are good options to explore.


If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Test

const regex = /\[([^\]\[\r\n]*)\]/gm;
const str = `This is a [sample] string with [some] special words. [another one]
This is a [sample string with [some special words. [another one
This is a [sample[sample]] string with [[some][some]] special words. [[another one]]`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

Source

Regular expression to match balanced parentheses

Emma
  • 27,428
  • 11
  • 44
  • 69
  • This answer is criminally underrated in that it provides an education and a taste of the power of finite state machines. – Gr3go Nov 17 '22 at 23:45
13

(?<=\[).*?(?=\]) works good as per explanation given above. Here's a Python example:

import re 
str = "Pagination.go('formPagination_bottom',2,'Page',true,'1',null,'2013')"
re.search('(?<=\[).*?(?=\])', str).group()
"'formPagination_bottom',2,'Page',true,'1',null,'2013'"
LJ Germain
  • 467
  • 1
  • 6
  • 14
devd
  • 183
  • 1
  • 6
  • 1
    You should always use code formatting for regexes, wherever they appear. If the regex is in the text rather than a code block, you can use backticks to format them. ([ref](http://stackoverflow.com/editing-help#comment-formatting)) – Alan Moore Apr 24 '15 at 01:28
  • 1
    Also, the question was about square brackets (`[]`), not parentheses. – Alan Moore Apr 24 '15 at 01:32
11

The @Tim Pietzcker's answer here

(?<=\[)[^]]+(?=\])

is almost the one I've been looking for. But there is one issue that some legacy browsers can fail on positive lookbehind. So I had to made my day by myself :). I manged to write this:

/([^[]+(?=]))/g

Maybe it will help someone.

console.log("this is a [sample] string with [some] special words. [another one]".match(/([^[]+(?=]))/g));
6

if you want fillter only small alphabet letter between square bracket a-z

(\[[a-z]*\])

if you want small and caps letter a-zA-Z

(\[[a-zA-Z]*\]) 

if you want small caps and number letter a-zA-Z0-9

(\[[a-zA-Z0-9]*\]) 

if you want everything between square bracket

if you want text , number and symbols

(\[.*\])
Balaji
  • 9,657
  • 5
  • 47
  • 47
4

This code will extract the content between square brackets and parentheses

(?:(?<=\().+?(?=\))|(?<=\[).+?(?=\]))

(?: non capturing group
(?<=\().+?(?=\)) positive lookbehind and lookahead to extract the text between parentheses
| or
(?<=\[).+?(?=\]) positive lookbehind and lookahead to extract the text between square brackets
Nezar Fadle
  • 1,335
  • 13
  • 11
4

In R, try:

x <- 'foo[bar]baz'
str_replace(x, ".*?\\[(.*?)\\].*", "\\1")
[1] "bar"
Tony Ladson
  • 3,539
  • 1
  • 23
  • 30
  • ..or `gsub(pat, "\\1", x, perl=TRUE)`, where `pat` is the regular expression you provided.. – Karsten W. Jul 16 '19 at 16:23
  • This solution is excellent in the way that it "extracts" the content inside the brackets *if there is one*, otherwise you get the input. – Jan Netík Oct 09 '21 at 13:51
3
([[][a-z \s]+[]])

Above should work given the following explaination

  • characters within square brackets[] defines characte class which means pattern should match atleast one charcater mentioned within square brackets

  • \s specifies a space

  •  + means atleast one of the character mentioned previously to +.

Peon
  • 7,902
  • 7
  • 59
  • 100
spooks
  • 49
  • 3
  • In sensitive cases `A-Z` should add to pattern : `([[][a-zA-Z \s]+[]])` ; I think it's good way, while `\ ` in regex patterns that defines in string marks ( " and ' ) and mixing up newbies by backslash handling in " or ' usages! – MohaMad Mar 01 '17 at 10:46
  • the only answer that worked for me for C++ regex (except im doing it with quotations instead of brackets). `std::regex pattern{R"(["][a-zA-Z \s]+["])"};` – StackAttack Oct 01 '18 at 05:42
3

I needed including newlines and including the brackets

\[[\s\S]+\]

citynorman
  • 4,918
  • 3
  • 38
  • 39
1

If someone wants to match and select a string containing one or more dots inside square brackets like "[fu.bar]" use the following:

(?<=\[)(\w+\.\w+.*?)(?=\])

Regex Tester

Andreas
  • 5,393
  • 9
  • 44
  • 53