0

today i break my head with a regex. I can't extract a part of text. My text is like this:

<!--TEXT[title]-->
sometext 1
<!--END-->
<!--TEXT[title]-->
sometext 2
<!--END-->

I want get this in a array

["title]-->sometext1"
,"title]-->sometext2"]

i have this regex code mytext.match(/<!--TEXT[([.|\w|\r|\n]+)<!--END-->/m);

iLevi
  • 936
  • 4
  • 10
  • 26
  • 3
    Is this text inside of some HTML? If so, don't parse HTML with a regex: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 Instead, parse the DOM. –  Oct 20 '11 at 14:37
  • 1
    @JackManey That's my favourite answer to anything ever. – punkrockbuddyholly Oct 20 '11 at 14:42

1 Answers1

3

Assuming you need a regular expression the following should work:

<\!--TEXT\[([^\]]*)\]-->\s*\n(.*)(?!<\!--END-->)

If this text is in a DOM it would be much better to parse the DOM however.

Explanation:

<\!--TEXT\[ // Match the start.
([^\]]*) // Match (in group 1), everything up until the next ']'
\]-->\s*\n // Match to the end of this line.
(.*) // Match anything (in group 2).
(?!<\!--END-->) // Stop before the end tag is next. (This will mean you get everything up to, but not including the previous line break).
Vala
  • 5,628
  • 1
  • 29
  • 55
  • 1
    Of course this will fail with nested comments, but this is something the OP should know... – FailedDev Oct 20 '11 at 14:49
  • Yes, if you're dealing with nested comments you want a lexer or a DOM. On the other hand in this particular case it doesn't look like they would be nested (without there being some error). – Vala Oct 20 '11 at 14:56