283

I'm using jQuery. I have a string with a block of special characters (begin and end). I want get the text from that special characters block. I used a regular expression object for in-string finding. But how can I tell jQuery to find multiple results when have two special character or more?

My HTML:

<div id="container">
    <div id="textcontainer">
     Cuộc chiến pháp lý giữa [|cơ thử|nghiệm|] thị trường [|test2|đây là test lần 2|] chứng khoán [|Mỹ|day la nuoc my|] và ngân hàng đầu tư quyền lực nhất Phố Wall mới chỉ bắt đầu.
    </div>
</div>

and my JavaScript code:

$(document).ready(function() {
  var takedata = $("#textcontainer").text();
  var test = 'abcd adddb';
  var filterdata = takedata.match(/(\[.+\])/);

  alert(filterdata); 

  //end write js 
});

My result is: [|cơ thử|nghiệm|] thị trường [|test2|đây là test lần 2|] chứng khoán [|Mỹ|day la nuoc my|] . But this isn't the result I want :(. How to get [text] for times 1 and [demo] for times 2 ?


I've just done my work after searching info on internet ^^. I make code like this:

var filterdata = takedata.match(/(\[.*?\])/g);
  • my result is : [|cơ thử|nghiệm|],[|test2|đây là test lần 2|] this is right!. but I don't really understand this. Can you answer my why?
ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
Rueta
  • 3,317
  • 2
  • 23
  • 22

3 Answers3

617

The non-greedy regex modifiers are like their greedy counter-parts but with a ? immediately following them:

*  - zero or more
*? - zero or more (non-greedy)
+  - one or more
+? - one or more (non-greedy)
?  - zero or one
?? - zero or one (non-greedy)
Asaph
  • 159,146
  • 25
  • 197
  • 199
  • 37
    might be useful to note that `?` on its own means 'one or zero' (but is greedy!). E.g. `'bb'.replace(/b?/, 'a') //'ab'` and `'bb'.replace(/c?/, 'a') //'abb'` – Hashbrown Oct 04 '13 at 04:46
  • 3
    how did c match nothing there – Muhammad Umer May 26 '19 at 06:03
  • 4
    @MuhammadUmer I think he was suggesting that because the `c` won't match, but you have the `?`, which is `0 or 1`, then it's going to match `0 number of c characters`, hence replacing it. I have no idea how it works though, because that doesn't compile in any regex engine i've tried – Noctis Feb 19 '20 at 05:40
  • If you still need to support MSIE 11 it's good to know that it doesn't support the `s` flag for the rexexp – I first thought that MSIE doesn't support non-greedy modifiers but the real cause was the `s` flag in my regexp. – Mikko Rantalainen Nov 12 '21 at 09:17
  • 1
    What exactly would be the difference between *?* and *??*? I somehow can't see the distinction between greedy and non-greedy *zero-or-one* matching condition. Help me understand. – Konrad Viltersten Aug 21 '22 at 13:25
  • 1
    @KonradViltersten it's to do with capturing. Eg. `'abcd'.match(/(a)?b??(.*)/)` will produce capture groups `a` and `bcd` because it matches the `b` lazily so `b` gets included into the greedy `.*` match. Without the double `?` it would only produce capture groups `a` and `cd` because the `b` gets consumed before the capture starts. – Waddles Jan 16 '23 at 07:05
46

You are right that greediness is an issue:

--A--Z--A--Z--
  ^^^^^^^^^^
     A.*Z

If you want to match both A--Z, you'd have to use A.*?Z (the ? makes the * "reluctant", or lazy).

There are sometimes better ways to do this, though, e.g.

A[^Z]*+Z

This uses negated character class and possessive quantifier, to reduce backtracking, and is likely to be more efficient.

In your case, the regex would be:

/(\[[^\]]++\])/

Unfortunately Javascript regex doesn't support possessive quantifier, so you'd just have to do with:

/(\[[^\]]+\])/

See also


Quick summary

*   Zero or more, greedy
*?  Zero or more, reluctant
*+  Zero or more, possessive

+   One or more, greedy
+?  One or more, reluctant
++  One or more, possessive

?   Zero or one, greedy
??  Zero or one, reluctant
?+  Zero or one, possessive

Note that the reluctant and possessive quantifiers are also applicable to the finite repetition {n,m} constructs.

Examples in Java:

System.out.println("aAoZbAoZc".replaceAll("A.*Z", "!"));  // prints "a!c"
System.out.println("aAoZbAoZc".replaceAll("A.*?Z", "!")); // prints "a!b!c"

System.out.println("xxxxxx".replaceAll("x{3,5}", "Y"));  // prints "Yx"
System.out.println("xxxxxx".replaceAll("x{3,5}?", "Y")); // prints "YY"
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • i copy your regex into my work and result is : invalid quantifier +\\]) [Break on this error] var filterdata = takedata.match(/(\\[[^\\]]++\\])/);\n (firebugs + Firefox) something wrong ? – Rueta May 13 '10 at 04:08
  • @Rueta: apparently Javascript flavor doesn't support possessive. I've edited my answer to reflect this fact. You can just use one `+` instead of two. – polygenelubricants May 13 '10 at 04:19
  • 1
    Though atomic groups can be used in place of possessive quantifiers, JavaScript does not support the atomic groups either. But there is a third alternative, see this: http://instanceof.me/post/52245507631/regex-emulate-atomic-grouping-with-lookahead - `you can emulate atomic grouping with LookAhead. (?>a) becomes (?=(a))\1` – Roland Pihlakas Feb 27 '15 at 01:01
  • 6
    This is a Java answer for a JavaScript question and Java != JavaScript. Readers, take note. – Roshambo Jul 10 '17 at 02:10
3

I believe it would be like this

takedata.match(/(\[.+\])/g);

the g at the end means global, so it doesn't stop at the first match.

bluish
  • 26,356
  • 27
  • 122
  • 180
iangraham
  • 438
  • 3
  • 10
  • yea, you are right in /g. i've just done my work with your answer /g ^^. But when i make regular /(\\[.+\\])/g my result is : [|cơ thử|nghiệm|] thị trường [|test2|đây là test lần 2|] chứng khoán [|Mỹ|day la nuoc my|] :( – Rueta May 13 '10 at 04:00