0

Data:

#r; 
 text
#r;

#r; 
  text2
#r;

Regex:

/#r;[\w\W]*#r;/

I just want to extract the first occurrence only (i.e. #r;text#r;). However, the following pattern is extracting both the matches.

What should I do in order to get only the first occurrence?

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
ahhmarr
  • 2,296
  • 4
  • 27
  • 30
  • This looks like a duplicate of http://stackoverflow.com/questions/2074452/regex-to-first-occurence-only – MetaEd Nov 08 '11 at 22:46

3 Answers3

3

See Option 4 below as the best recommended option.

Option 1: Without using lookaheads and using a non-greedy wildcard match, you can use this regex:

/#r;.*?#r;/

This matches:

a pattern that starts with "#r;"
followed by any number of characters, but the fewest possible
followed by "#r;"

Option 2: Or if you want to get just the text between the delimiters, you can use this and then reference the [1] item returned from the search:

/#r;(.*?)#r;/

"#r;text1#r;#r;text2#r;".match(/#r;(.*?)#r;/)[1] == "text1"

You can see it in action here: http://jsfiddle.net/jfriend00/ZYdP8/

Option3: Or, if there are actually newlines before and after each #r; in the thing you're trying to match, then you would use this regex:

/#r;\n(.*?)\n#r;/

which you can see working here: http://jsfiddle.net/jfriend00/ZYdP8/10/.

Option4: Or, (taking Tom's suggestion) if you don't want any whitespace of any kind to be part of the match on the boundaries, you can use this:

/#r;\s*(.*?)\s*#r;/

which you can see working here: http://jsfiddle.net/jfriend00/ZYdP8/12/.

jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • I would suggest something like `/#r;\s*(.*?)\s*#r;/`. This is a uniform regex which can be used both when there is whitespace included between the `#r;` tags, but also when there is no whitespace included. – Tom Knapen Nov 08 '11 at 23:17
  • @TomKnapen - good suggestion Tom. I was unsure what the OP's whitespace needs were. I didn't know if there actually were newlines in the text and whether they wanted other whitespace to be part of the match or not. I added this suggestion to the end of my answer. – jfriend00 Nov 08 '11 at 23:32
  • Doesn't the `.*?` fail to match multiline sequences? – Mike Samuel Nov 20 '11 at 18:26
  • @MikeSamuel - I'm unsure what problem you're referring to. This problem does not require returning a match that includes a line boundary and all the sample jsFiddle's have line boundaries in them with no problems. – jfriend00 Nov 20 '11 at 18:39
  • @jfriend00, take a look at http://jsfiddle.net/zbRVZ/1/ which I forked from your example. – Mike Samuel Nov 20 '11 at 18:46
  • You've created an example of \n inside the desired matched string. If that was a requirement for the solution, then one would need to do something different, but that was not a requirement in the example provided in the question. – jfriend00 Nov 20 '11 at 19:02
0

Your problem is that the * matches everything and does not stop at the close boundary so it ends up consuming " text\nr#;\n\nr#;\n text2\n" instead of just " text\n". The solution is to make the * lazy:

/#r;[\w\W]*?#r;/

The non-greedy qualifier (the ? after the *) causes the * to match just enough for the regular expression as a whole to work.

http://www.regular-expressions.info/possessive.html has more info:

A greedy quantifier will first try to repeat the token as many times as possible, and gradually give up matches as the engine backtracks to find an overall match. A lazy quantifier will first repeat the token as few times as required, and gradually expand the match as the engine backtracks through the regex to find an overall match.

Mike Samuel
  • 118,113
  • 30
  • 216
  • 245
0

try this out.

/#r;[\w\W](?=#r;)/
Eric
  • 7,930
  • 17
  • 96
  • 128