0

First, I'm using Laravel that's why there's return at the end of the code but it won't affect anything actually

$strxml = '<?xml version="1.0" encoding="utf-8" ?>
            <xliff>
                <body>
                    <trans-unit id="NFDBB2FA9-tu4" xml:space="preserve">
                        <source xml:lang="en">He</source>
                        <target xml:lang="id">He</target>
                    </trans-unit>
                    <trans-unit id="NFDBB2FA9-tu5" xml:space="preserve">
                        <source xml:lang="en">She</source>
                        <target xml:lang="id">She</target>
                    </trans-unit>
                </body>
                <body>
                    <trans-unit id="NFDBB2FA9-tu6" xml:space="preserve">
                        <source xml:lang="en">They</source>
                        <target xml:lang="id">They</target>
                    </trans-unit>
                    <trans-unit id="NFDBB2FA9-tu7" xml:space="preserve">
                        <source xml:lang="en">We</source>
                        <target xml:lang="id">We</target>
                    </trans-unit>
                </body>
            </xliff>';

        $dom = new \DOMDocument;
        $dom->loadXML($strxml);

        $xp = new \DOMXPath($dom);
        $xp->registerNamespace('xml', 'http://www.example.com');

        $col = $xp->query('//xliff/body/trans-unit');
        if ($col && $col->length) {
            foreach ($col as $node) {
                $target = $xp->query('target', $node)->item(0);
                $target->nodeValue = '<mrk id="1">Banana';
            }
        }

        return $dom->saveXML();

it outputs:

<?xml version="1.0" encoding="utf-8" ?>
<xliff>
    <body>
        <trans-unit id="NFDBB2FA9-tu4" xml:space="preserve">
            <source xml:lang="en">He</source>
            <target xml:lang="id">&lt;mrk id="1"&gt;Banana</target>
        </trans-unit>
        <trans-unit id="NFDBB2FA9-tu5" xml:space="preserve">
            <source xml:lang="en">She</source>
            <target xml:lang="id">&lt;mrk id="1"&gt;Banana</target>
        </trans-unit>
    </body>
    <body>
        <trans-unit id="NFDBB2FA9-tu6" xml:space="preserve">
            <source xml:lang="en">They</source>
            <target xml:lang="id">&lt;mrk id="1"&gt;Banana</target>
        </trans-unit>
        <trans-unit id="NFDBB2FA9-tu7" xml:space="preserve">
            <source xml:lang="en">We</source>
            <target xml:lang="id">&lt;mrk id="1"&gt;Banana</target>
        </trans-unit>
    </body>
</xliff>

notice there are special characters on the <target> text

have done this $target->nodeValue = html_entity_decode('<mrk id="1">Banana'); but didn't work

How do I encode it?

mending3
  • 586
  • 7
  • 21
  • 1
    Adding in `'Banana'` would create invalid XML as the `mrk` tag isn't closed. – Nigel Ren Mar 03 '21 at 13:52
  • this doesn't work `$target->nodeValue = htmlentities('Banana');`. it still doesn't encode – mending3 Mar 03 '21 at 13:54
  • Well that’s because you are using `nodeValue`, which for normal DOM nodes is equivalent to the _text content_ of the node. But you don’t want to set _text_ here, you want to create an actual child _element_ for `target`. – CBroe Mar 03 '21 at 14:10

1 Answers1

-2

DOMNode::nodeValue does a weird half escape on write - use DOMNode::$textContent for text:

$document = new DOMDocument();
$document
  ->appendChild($document->createElement('demo'))
  ->textContent = '<mrk id="1">foo & bar';
echo $document->saveXML();

Output:

<?xml version="1.0"?>
<demo>&lt;mrk id="1"&gt;foo &amp; bar</demo>

For XML fragments use DOMDocumentFragment. In this case your content has to be valid XML. Tags have to be closed.

$document = new DOMDocument();
$document
  ->appendChild($document->createElement('demo'));

$fragment = $document->createDocumentFragment();
$fragment->appendXML('<mrk id="1"/>foo &amp; bar');    
$document->documentElement->appendChild($fragment);

echo $document->saveXML();

Output:

<?xml version="1.0"?>
<demo><mrk id="1"/>foo &amp; bar</demo>

XLIFF 1.2 Translations

The provided XML looks a lot like XLIFF 1.2, but it is missing the namespace. The namespace adds complexity so for the following example I assume that it is needed (otherwise use the previous document fragment example).

$xliff = <<<'XML'
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
    <body>
        <trans-unit id="NFDBB2FA9-tu4" xml:space="preserve">
            <source xml:lang="en">He</source>
            <target xml:lang="id">He</target>
        </trans-unit>
        <trans-unit id="NFDBB2FA9-tu5" xml:space="preserve">
            <source xml:lang="en">She</source>
            <target xml:lang="id">She</target>
        </trans-unit>
    </body>
</xliff>
XML;

// simulate user input
$_POST = [
  'id' => 'NFDBB2FA9-tu5',
  'text' => '<mrk id="1">Banana</mrk>'
]; 

// bootstrap DOM
$document = new DOMDocument();
$document->loadXML($xliff);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('x', 'urn:oasis:names:tc:xliff:document:1.2');

// fetch a specific target by id
$expression = '//x:trans-unit[@id="'.$_POST['id'].'"]/x:target';
foreach($xpath->evaluate($expression) as $target) {
    $fragment = $document->createDocumentFragment();
    // wrap the fragment text to define the default namespace for elements
    $fragment->appendXML(
      '<target xmlns="urn:oasis:names:tc:xliff:document:1.2">'.
        $_POST['text'].'</target>'
    );
    // clear target node content
    $target->textContent = '';
    // append new content
    if ($fragment->firstChild->hasChildNodes()) {
        $target->append(...$fragment->firstChild->childNodes);
    }
}

echo $document->saveXML();  

Output:

<?xml version="1.0"?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2">
    <body>
        <trans-unit id="NFDBB2FA9-tu4" xml:space="preserve">
            <source xml:lang="en">He</source>
            <target xml:lang="id">He</target>
        </trans-unit>
        <trans-unit id="NFDBB2FA9-tu5" xml:space="preserve">
            <source xml:lang="en">She</source>
            <target xml:lang="id"><mrk id="1">Banana</mrk></target>
        </trans-unit>
    </body>
</xliff>
ThW
  • 19,120
  • 3
  • 22
  • 44
  • 1
    the issue I'm facing is any html tags are not encoded. `<mrk id="1">foo & bar` is my issue actually – mending3 Mar 03 '21 at 14:01
  • So you mean the problem is that they are encoded. Without this your XML output would be invalid (the `mrk` element is not closed). I added an example for XML fragments to my answer. – ThW Mar 03 '21 at 14:08
  • would you implement that to my code so that it's fixed once and for all? – mending3 Mar 03 '21 at 14:11