0

RegExes give me headaches. I have a very simple regex but I don't understand how it works.

The code:

var str= "startBlablablablablaend";
var regex = /start(.*?)end/;
var match = str.match(regex);
console.log( match[0] ); //startBlablablablablaend
console.log( match[1] ); //Blablablablabla

What I ultimately want would be the second one, in other words the text between the two delimiters (start,end).

My questions:

  • How does it work? (each character explained please)
  • Why does it match two different things?
  • Is there a better way to get match[1]?
  • If I want to get all the text's between all the start-end instances, how would I go about it?

For the last question, what I mean:

var str = "startBla1end startBla2end startBla3end";
var regex = /start(.*?)end/gmi;
var match = str.match(regex);
console.log( match ); // [ "startBla1end" , "startBla2end" , "startBla3end" ]

What I need is:

console.log( match ); // [ "Bla1" , "Bla2" , "Bla3" ];

Thanks :)

undefined
  • 3,949
  • 4
  • 26
  • 38
  • 1
    Mozilla has a great [reference and tutorial for regular expressions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions) in JavaScript. – Spencer Wieczorek Dec 30 '14 at 19:19
  • I'll check it out then, thank you! In the meantime anyone can save me some time explaining this case? – undefined Dec 30 '14 at 19:21

3 Answers3

4

How does it work?

  • start matches start in the string

  • (.*?) non greedy match for character

  • end matches the end in the string

Matching

startBlablablablablaend
  |
start

startBlablablablablaend
     |
     .

startBlablablablablaend
      |
      .

# and so on since quantifier * matches any number of character. ? makes the match non greedy

startBlablablablablaend
                     |
                    end

Why does it match two different things?

It doesnt match 2 differnt things

  • match[0] will contain the entire match

  • match[1] will contain the first capture group (the part matched in the first paranthesis)

Is there a better way to get match[1]?

Short answer No

If you are using languages other than javascript. its possible using look arounds

(?<=start)(.*?)(?=end)
#Blablablablabla

Note This wont work with javascript as it doesnt support negative lookbehinds

Last Question

The best that you can get from a single match statement would be

var str = "startBla1end startBla2end startBla3end";
var regex = /start(.*?)(?=end)/gmi;
var match = str.match(regex);
console.log( match ); // [ "startBla" , "startBla2" , "startBla3" ]
nu11p01n73R
  • 26,397
  • 3
  • 39
  • 52
  • Thanks! This solves everything but the last question hahaha! Anyway you have the most complete answer, so green tick it is! – undefined Dec 30 '14 at 19:36
  • 1
    @Rou You are welcome :) To solve your last question the best result you can hope is using a look ahead. which will not contain `end` in the output. start can be replaced by normal iteration throught the array thereafter – nu11p01n73R Dec 30 '14 at 19:46
1

You need not to do a much effort on it.

Try this this regex:

start(.*)end

You can look at this stackoverflow question which already been answered before.

Regular Expression to get a string between two strings in Javascript

Hope it helps.

Community
  • 1
  • 1
Mohit Pandey
  • 3,679
  • 7
  • 26
  • 38
  • Thanks, I'll check that out too! Somehow i missed that one (I've checked a lot of "get string in between with regexp" questions) – undefined Dec 30 '14 at 19:35
1

To solve your last question, you can split up your string and iterate:

var str = "startBla1end startBla2end startBla3end";
var str_array = str.split(" "); 

Then iterate over each element of the str_array using your existing code to extract each Bla# substring.

jrel
  • 187
  • 1
  • 11
  • Thanks this would be my second choice, I guess I'll do that! :) – undefined Dec 30 '14 at 19:35
  • Yeah there's probably a more elegant way, if by elegant I mean an even more complex regex. But since we all hate reading other peoples complex regex, I say keep it simple even if it's a few more lines of code. – jrel Dec 31 '14 at 20:49