1

We have developed some flash application with WYSIWYG editor on backend. We have to present more functionality in editor so we decided to put custom tags < start more > ... < /end more > in our WYSIWYG.

All HTML is parsed and converted to XML, but only problem is we need to get the start more /end more tags to convert them to custom fade effects to show more content on a post inside flash.

Long story short, here is sample XML output.

Some text outside <start more> some text inside</end more>
some other text <start more>1 and some random stuff <start more>2 and 
thing </end more>2 and random stuff </end more>

Regular expression to get start more and end more

/(<start more>){1,1}(.+?)(<end more>)/

this expression capture first < start more > and first < end more > in the string. i tried to do negative lookahead assertion to only get inner most tags. but not working.

hope it makes sense. Let me know if I couldn't explain the problem.

Aamir Mahmood
  • 2,704
  • 3
  • 27
  • 47

2 Answers2

3

You should work that into your parser, which you said you already have.
If you change <start more></end more> to a valid pair, say <more> </more>, any HTML parser should already handle it correctly, even if it isn't a known tag.

If you insist, a weak regex might be:

/<start more>(((?!<(?:/end|start) more>).)+)</end more>/
Kobi
  • 135,331
  • 41
  • 252
  • 292
  • Here's an example - it only captures the innermost pair: http://rubular.com/r/4p8jGiepW5 – Kobi Dec 23 '10 at 12:18
  • Thank you for the answer and the example. I have removed the " more" from the code, and '(((?!<(?:\/end|start)>).)+)<\/end>' is working perfect. the reason not to use other HTML parser (I tried php domdocument) that if we try to get content it also strips out all the nested html tags, which we needed to display in flash (e.g. bold, italic, a etc.) for all who are interested can check http://pk.dynamiconlinesystems.com/mc_main/ yet its work in progress. – Aamir Mahmood Dec 23 '10 at 12:51
  • Instead of getting the content of the parsed HTML, you need to write out the parts you need with an HTML serialiser. – OrangeDog Mar 17 '12 at 15:02
2

It is not possible to correctly parse xml/html with regular expressions. You will have to write a proper parser.

Community
  • 1
  • 1
OrangeDog
  • 36,653
  • 12
  • 122
  • 207
  • 1
    To support your point you could have linked to one of the more objective explanations below, not the fun answer. – mario Dec 23 '10 at 12:25
  • I could have done, but that would be less fun. Once you are informed that xml (or any tree-based language) is not regular, it is quite easy to find reasonable explanations why. – OrangeDog Dec 23 '10 at 12:28
  • The reason people keep posting html-with-regex questions isn't that said link isn't funny enough. It's because seldomly anyone offers a simpler alternative, and word doesn't get around from not explaining. – mario Dec 23 '10 at 12:31
  • 3
    @mario - No, it's because people don't search for "html with regex" before asking questions about it. Providing a recursive decent parser implementation in response to a direct regex question would be off-topic at best. – OrangeDog Dec 23 '10 at 12:38
  • @OrangeDog: I accept your reason as the correct cause. Still linking to an off-topic solution might make them go away faster. – mario Dec 23 '10 at 12:43
  • @OrangeDog. Thank you for your feedback, I have written a parser to convert HTML to FLASH friendly. Just need to get custom tags. I know admin could use nested tags so I will just convert inner most tags and then will remove the un-necessary start, end tags. – Aamir Mahmood Dec 23 '10 at 12:46
  • @OrangeDog - I disagree. Understanding the underlying problem and providing a real solution isn't considered off topic at all - it is important to understand why the question is asked. Here are discussions about this issue: [What is the XY problem?](http://meta.stackexchange.com/q/66377/7586), [Can we get people to directly ask about their problems instead of topics they think might be related?](http://meta.stackexchange.com/q/56366/7586) – Kobi Dec 23 '10 at 16:45
  • Having meta discussions in the comments of a solved question is possibly also off-topic. – OrangeDog Dec 23 '10 at 16:58
  • But should you be ask about how to discuss how to answer questions in answer comments? :p – OrangeDog Dec 23 '10 at 17:27