how to regex the string between two tokens and return string without the tokens?

Question

Fighting with regex....

I'm using this to find pieces of HTML-string between certain elements:

 for (i = 0; i < 2; i += 1) {
   target = block[i];   // like BODY or HEAD
   regex = RegExp('<' + target + '>(.)+</' + target + '>');
   // in case string passed includes breaks/spaces
   data = data.replace(/(\r\n|\n|\r)/gm,"").replace(/\s+/g," ")
             .match(regex);
   entry = data[0].replace(/<!-- [\s\S]*? -->/g, '');
   console.log(entry);
 }

While this works fine, it returns something like this:

<head>....everthing I want ....</head>

Question:
How do I need to modifiy the regex, so that I can still specifiy the element whose content I need, but which returns only the content and not content & tokens (like <head></head>).

Thanks!

Use Ambers solution and also move the parens to include the `+` like this `'<' + target + '>(.+)' + target + '>'` — Dehalion, Feb 16 '13 at 22:49
Is there anything wrong with `$(target).each(function(){ console.log($(this).html()); })` apart from the comment nodes? — Fabrício Matté, Feb 16 '13 at 22:51
@FabrícioMatté: actually no. I had some templates, where comments , but this one does not, so also trying this. — frequent, Feb 16 '13 at 22:53
Of course, spaces still have to be collapsed with regex, and comment nodes can be removed with either `contents().filter()` or regex but yes, I'm still unsure of what you're trying to achieve. — Fabrício Matté, Feb 16 '13 at 22:55
@Fabricio: I'm working on a plugin that pulls in snippets of code, which I prefer to be snippets, but which come as (uncompressed) HTML pages (think of a page with a button). I'm having to extract the bits and pieces of the snippet page to use, because I cannot append the full snippet as is. So I created the regex to filter for script/css, which I'm appending to page head, plus whats in the body (e.g. the solo button), which goes into the page. I solved it with Ambers answer, so I'm a happy camper. Thanks! — frequent, Feb 16 '13 at 22:59
No problem. `=]` Though you know, regex is not really [suitable for parsing HTML](http://stackoverflow.com/a/1732454/1331430). That means, your regex will fail to match if the tag has any html attribute e.g. ``, or if `` is inside a comment node and so forth. Hopefully you aren't using thar regex on the wild. `:P` — Fabrício Matté, Feb 16 '13 at 23:01
@FabrícioMatté: well... technically it's being loaded as a string (requirejs text plugin). So not really sure, but this is a temp patch anyway until I find a better solution. — frequent, Feb 16 '13 at 23:10

score 1 · Accepted Answer · answered Feb 16 '13 at 22:48

1

Use the first matching group instead of the whole match.

regex = RegExp('<' + target + '>(.+)</' + target + '>');

and then...

entry = data[1].replace(/<!-- [\s\S]*? -->/g, '');

answered Feb 16 '13 at 22:48

Amber

507,862
82
626
550

Note the slight edit - you need `(.+)` (one matching group of repeated characters) rather than `(.)+` (repeated matching groups of one character each). – Amber Feb 16 '13 at 22:50
nice! I was looking at my `[1]` returning `>` wondering what to make of it :-) Thanks a lot! – frequent Feb 16 '13 at 22:55

how to regex the string between two tokens and return string without the tokens?

1 Answers1