0

I am trying to analyze a piece of text via JavaScript and, reading up, have learned that parsing HTML with Regex is quite evil. I'd like to remove a more sinister part of my text before I analyze it.

If I've got a chunk of text like the item below, how might I (1) slice everything from [caption] to [/caption] and (2) store that text in a new var?

Sed rutrum enim sit amet sem fringilla egestas placerat mauris pretium. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris ultricies egestas malesuada. Etiam rhoncus eros a leo imperdiet vitae tincidunt purus laoreet. Mauris ut mauris quam, sed pharetra urna. Etiam eu enim mauris, vitae bibendum orci. Quisque ac sapien massa, at dignissim tellus.

[caption id="blah" align="alignleft" width="123" caption="Lorem ipsum dolor sit, consectetur adipiscing elit."]<a href="http://www.google.com/something"><img title="Lorem ipsum dolor sit, consectetur adipiscing elit." src="http://google.com/something/else.png" alt="Lorem ipsum dolor sit, consectetur adipiscing elit." width="345" /></a>[/caption]

Aenean faucibus mi sit amet leo suscipit nec egestas leo ultrices. Integer tincidunt, urna quis varius accumsan, urna quam congue nulla, ut ornare orci purus in ligula. Suspendisse varius, tellus aliquam tincidunt, ante semper elit, sit amet tincidunt elit augue eget odio. Vivamus sit amet tincidunt massa. Sed nunc ligula, feugiat quis volutpat congue, eleifend in tellus. Curabitur ut dictum felis. Nunc sodales euismod leo, in commodo elit ornare hendrerit. Cras luctus eros id nisl vestibulum elementum. Maecenas ut neque turpis. Donec ornare hendrerit rutrum. Non nibh leo, dictum ullamcorper dui.
Community
  • 1
  • 1
buley
  • 28,032
  • 17
  • 85
  • 106
  • 1
    you have used `[caption]` (with square brackets)... do you mean `` (with angled brackets)? There's no problem with using angled brackets inside a code block in S.O. – Lee Dec 14 '10 at 21:44
  • Thanks for the comment. Believe it or not I'm dealing with a text edit that likes to use it's own psuedo-code. I have noticed the same rules of "regex will be painful" are equally if not more relevant. – buley Dec 15 '10 at 07:02

3 Answers3

2

You can also use a regex:

var split = text.split(/\[\/?caption[^\]]*\]/);

and take split[1] as result

morja
  • 8,297
  • 2
  • 39
  • 59
1

You can use .split()

var temp = yourText.split("[caption")

This gives you:

temp[0] with everything before "[caption"

temp[1] with everything after "[caption"

You can then continue to split/join the array fragments to eliminate any potion of the string.

Diodeus - James MacFarlane
  • 112,730
  • 33
  • 157
  • 176
  • 1
    just be careful if you have multiple occurrences of `[caption]`... especially if they are *nested*: `[caption] whatever [caption] stuff [/caption] other stuff [/caption]`. – Lee Dec 14 '10 at 21:49
1
  • If you're parsing plain text there is no way to do it easier than with regex, at least there is no built-in functions in js to do it.
  • If you're parsing HTML in browser, there is much easy way to do it using DOM tree and functions for DOM, afaik it's recommended way to do it. And js frameworks like jQuery make this task as easy as it could be.
  • If you're parsing HTML on server-side, e.g. by node.js, there're also libraries for creating and working with DOM like jsdom, and again you can do it by using DOM functions.
maga
  • 720
  • 3
  • 13