0

I need a template-engine for a project, that parses sgml-content and converts user-defined tags like <ext:grid />. In nearly every case, the input-content is valid. A common problem is the link-generator. The link-generator produces & instead of &amp; which is the main cause of my parsing-problems. I cant change this behavior, because the output of that generator is used in many other situations where the links are required to have an & instead of &amp;.

I have tried DOMDocument, SimpleXML and xml_parser. They all exit on entity problems. Any ideas? All I want is, that this "problem" gets simply ignored by the parser.

Where is a test-template:

<template xmlns:grid="templates/grid" xmlns:std="templates/std">
    <std:header text="Overview" type="h1" />
    <grid:base width="100%">
        <grid:columns>
        <grid:body>
            <?php foreach($products as $product): /* @var $product Dfm_Shop_Model_Product */ ?>
            <grid:row selectable="1">
                <grid:cell>
                    <div><?php echo $this->esc($product->getTitle()) ?></div>
                </grid:cell>
                <grid:cell>
                    <a href="<?php env()->http()->to(array('controller' => 'Dfm_Shop_Controller_Products', 'method' => 'showEdit')) ?>"><std:img src="icons/pencil.png" hint="Edit" /></a>
                </grid:cell>
            </grid:row>
            <?php endforeach ?>
        </grid:body>
    </grid:base>
</template>
Ron
  • 1,336
  • 12
  • 20

2 Answers2

1

Can you just search-replace your & to &amp; before trying to parse the document?


Edit: Just to add for the completeness, there's for example QueryPath that can handle invalid tags, too.

According to the thread linked above, libxml functions should've also worked.

Community
  • 1
  • 1
eis
  • 51,991
  • 13
  • 150
  • 199
  • preg_replace would consume too much performance. A simple str_replace would replace `&` with `&amp;`. I should try strtr for this... – Ron Oct 24 '12 at 14:16
  • you can first str_replace `&` to `&`, and then replace back :) that would of course fail with any other proper xml entities, like `'` ... – eis Oct 24 '12 at 14:23
  • `$content = strtr($content, array('&' => '&', '&' => '&'));` does the trick – Ron Oct 24 '12 at 14:24
  • `This problem is actually that your XML is not XML at all, but something close to it`. The content is virtually correct xml. Now DOMDocument can parse the document without any problems. – Ron Oct 24 '12 at 14:26
0

If you can, try to use cdata with xml.

<![CDATA[your content]]>

http://www.w3schools.com/xml/xml_cdata.asp

kwelsan
  • 1,229
  • 1
  • 7
  • 18
  • That would require me to all CDATA everywhere I used my link-generator. That would be the same if I would just use htmlentities around the link-generator. Too easy to forget :/ – Ron Oct 24 '12 at 14:14
  • But it can solve your problem permanently. To build a good system we need to do some extra efforts. – kwelsan Oct 24 '12 at 14:17