4

I need to change texts in a XML file using PHP code. Then I created a code to:

1- get the file

2- replace the texts

3- save the file with other name.

Problem is that I am having some issues to replace some text in a xml file.

I am able to replace simples strings but I can not replace text with characters like '<'. Below the real code and files.

Original XML path: http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml

1) This code just changes the text Inmuebles to xxxxxxxx. This works fine

    $xml_external_path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$xml = file_get_contents($xml_external_path);

$response = strtr($xml, array(
    'Inmuebles' => 'xxxxxxxx'
));

$newXml = $response;

$newXml = simplexml_load_string( $newXml );
$newXml->asXml('/home/csainmobiliaria/www/pisos-NEW.xml');

2) Now, if I use this code to change the text <Table Name="Inmuebles"> to <xxxxxxxx> I get a ERROR 500.

    $xml_external_path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$xml = file_get_contents($xml_external_path);

$response = strtr($xml, array(
    '<Table Name="Inmuebles">' => '<xxxxxxxx>'
));

$newXml = $response;

$newXml = simplexml_load_string( $newXml );
$newXml->asXml('/home/csainmobiliaria/www/pisos-NEW.xml');

3) In the same way, if I use this code to remove the text Publicacion I get a ERROR 500.

    $xml_external_path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$xml = file_get_contents($xml_external_path);

$response = strtr($xml, array(
    '<Publicacion>' => ''
));

$newXml = $response;

$newXml = simplexml_load_string( $newXml );
$newXml->asXml('/home/csainmobiliaria/www/pisos-NEW.xml');

This is the final result I need to get:http://www.csainmobiliaria.com/imagenes/fotos/pisos-OK.xml

Capture: enter image description here

JPashs
  • 13,044
  • 10
  • 42
  • 65
  • `` to `` makes the closing `
    ` invalid, and the closing `` non-existent. Use the parser and do this. Also when you `get a ERROR 500` check your error logs it will tell you what is wrong. If it doesn't look at the manual for error reporting functions. The `` approach has the same issue. Don't use string functions on structured data (CSVs, JSON, XML, etc.), use the appropriate parsers.
    – user3783243 Jan 25 '19 at 12:48
  • @user3783243 I'm afraid I don't don't know what 'parsers' are. Do you mean the string int search function? – JPashs Jan 25 '19 at 13:15
  • `simplexml` is a parser. You should bring the file as it is into that, restructure it as needed, then output it. (There are other parsers as well if you don't like that one) – user3783243 Jan 25 '19 at 14:30
  • Possible duplicate of [How do you parse and process HTML/XML in PHP?](https://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) – user3783243 Jan 25 '19 at 14:30
  • 1
    XSLT is a template language for just this use case - it transforms one XML into another XML, HTML or Text. PHP has an extension (ext/xsl) for it. – ThW Jan 25 '19 at 18:16
  • @ThW thanks. I understand that I just need to load and save the xml with XSLT instead of using simplexml. I found this https://inviqa.com/blog/transforming-xml-php-and-xsl but it doesn't show how to save it as xml. Can you please help me with this? – JPashs Jan 26 '19 at 16:36
  • Whenever you get a 500 error, the **very first thing you need to do** is find your error log, or turn on error reporting on your dev server. Then, if you don't understand what you found, you can **tell us the exact error message you're getting**. See https://stackoverflow.com/questions/12769982/reference-what-does-this-error-mean-in-php/12772851#12772851 and [mcve]. – IMSoP Jan 28 '19 at 11:05
  • @JPashs the result xml is invalid because it should have a root element. when you remove `` you make an xml wuthout root and close tag wuthout open. First, define a correct result – splash58 Jan 29 '19 at 19:00

3 Answers3

4

You can copy the necessary node instead of removing any excess elements. For example, you can copy Inmuebles node with help SimpleXML:

$path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$content = file_get_contents($path);
$sourceXML = new SimpleXMLElement($content);

$targetXML = new SimpleXMLElement("<Inmuebles></Inmuebles>");

$items = $sourceXML->xpath('Table[@Name=\'Inmuebles\']');
foreach ($items as $item) {
    foreach ($item->Inmueble as $inmueble) {
        $node  = $targetXML->addChild('Inmueble');
        $node->addChild('IdInmobiliariaExterna', $inmueble->IdInmobiliariaExterna);
        $node->addChild('IdPisoExterno', $inmueble->IdPisoExterno);
        $node->addChild('FechaHoraModificado', $inmueble->FechaHoraModificado);
        $node->addChild('TipoInmueble', $inmueble->TipoInmueble);
        $node->addChild('TipoOperacion', $inmueble->TipoOperacion);
    }
}

echo $targetXML->asXML()

Also, as @ThW said in comments you can use XLST, for example:

$path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$content = file_get_contents($path);
$sourceXML = new SimpleXMLElement($content);

$xslt='<?xml version="1.0" encoding="ISO-8859-1"?>
         <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
         <xsl:output method="xml" indent="yes"/>

         <xsl:template match="Table[@Name=\'Inmuebles\']">
             <Inmuebles>
                 <xsl:copy-of select="node()"/>
             </Inmuebles>
         </xsl:template>

         <xsl:template match="Table[@Name=\'Agencias\']"/>
</xsl:stylesheet>';


$xsl = new SimpleXMLElement($xslt);

$processor = new XSLTProcessor;
$processor->importStyleSheet($xsl);
$result = $processor->transformToXML($sourceXML);
$targetXML = new SimpleXMLElement($result);
echo $targetXML->asXML();
Maksym Fedorov
  • 6,383
  • 2
  • 11
  • 31
  • The first code works great. Just one question: one of the element contains html code `
    ` which is not copied (migrated) to the new xml. How can I sort this out? Thanks.
    – JPashs Jan 28 '19 at 15:02
  • @JPashs can you attach an example of XML? – Maksym Fedorov Jan 28 '19 at 18:11
  • Here the url of the real xlm: http://www.csainmobiliaria.com/imagenes/fotos/pisos.xml And here a capture where you can see the html tags: https://postimg.cc/XrQDw9Xt After I run the code the `
    ` html tab is remove from the text.
    – JPashs Jan 29 '19 at 08:33
  • @Maxim_Fedorov did you see my last comment. – JPashs Jan 29 '19 at 13:21
  • @JPashs `3 DORMITORIOS,1 CUARTO DE BAÑO
    ` is invalid XML. Therefore SimpleXML truncate HTML tags. An element must contain HTML in <![CDATA[]]> block
    – Maksym Fedorov Jan 29 '19 at 13:29
  • The XML your saying about is perfectly valid, although it is using HTML tags there is nothing wrong with the nesting of the tags or the format of the tags. – Nigel Ren Jan 29 '19 at 14:44
  • @Maxim_Fedorov Then, you see no way to get the html tags into the new xml? – JPashs Jan 30 '19 at 08:12
4

DOMDocument allows you to copy structures of nodes, so rather than having to copy all the details individually (which can be prone to missing data when the specification changes), you can copy an entire node (such as <Inmueble>) from one document to another using importNode() which has a parameter to indicate that the full content of the element should be copied. This approach also allows you to copy any of the tables using the same function without code changes...

function extractData ( $sourceFile, $table )    {
    // Load source data
    $source = new DOMDocument();
    $source->load($sourceFile);
    $xp = new DOMXPath($source);

    // Create new data document
    $newFile = new DOMDocument();
    $newFile->formatOutput = true;
    // Create base element with the table name in new document
    $newRoot = $newFile->createElement($table);
    $newFile->appendChild($newRoot);

    // Find the records to copy
    $records = $xp->query('//Table[@Name="'.$table.'"]/*');
    foreach ( $records as $record ) {
        // Import the node to copy and append it to new document
        $newRoot->appendChild();
    }
    // Return the source of the XML
    return $newFile->saveXML();
}

echo extractData ($xml_external_path, "Inmuebles");

You could alter the method to return the document as DOMDocument or even a SimpleXML version if you wished to process it further.

For SimpleXML, change the return to...

return simplexml_import_dom($newRoot);

and then you can call it as...

$ret = extractData ($xml_external_path, "Inmuebles");
echo $ret->asXML();

Or if you just want a fixed way of doing this, you can remove the XPath and just use getElementsByTagName() to find the nodes to copy...

$source = new DOMDocument();
$source->load($xml_external_path);

$newFile = new DOMDocument();
$newRoot = $newFile->createElement("Inmuebles");
$newFile->appendChild($newRoot);

// Find the records to copy
foreach ( $source->getElementsByTagName("Inmueble") as $record ) {
    $newRoot->appendChild($newFile->importNode($record, true));
}
echo $newFile->saveXML();

To add the save file name, I've added a new parameter to the function, this new function doesn't return anything at all - it just loads the file and saves the result to the new file name...

function extractData ( $sourceFile, $table, $newFileName )    {
    // Load source data
    $source = new DOMDocument();
    $source->load($sourceFile);
    $xp = new DOMXPath($source);

    // Create new file document
    $newFile = new DOMDocument();
    $newFile->formatOutput = true;
    // Create base element with the table name in new document
    $newRoot = $newFile->createElement($table);
    $newFile->appendChild($newRoot);

    // Find the records to copy
    $records = $xp->query('//Table[@Name="'.$table.'"]/*');
    foreach ( $records as $record ) {
        // Import the node to copy and append it to new document
        $importNode = $newFile->importNode($record, true);
        // Add new content
        $importNode->appendChild($newFile->createElement("Title", "value"));
        $newRoot->appendChild();
    }

    // Update Foto elements
    $xp = new DOMXPath($newFile);
    $fotos = $xp->query("//*[starts-with(local-name(), 'Foto')]");
    foreach ( $fotos as $foto ) {
        $path = $foto->nodeValue;
        if( substr($path, 0, 5) == "/www/" )    {
            $path = substr($path,4);
        }
        // Replace node with new version
        $foto->parentNode->replaceChild($newFile->createElement("Foto1", $path), 
                  $foto);
    }  

    $newFile->save($newFileName);
}
$xml_external_path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos.xml';
$xml_external_savepath = 'saveFile.xml';

extractData ($xml_external_path, "Inmuebles", $xml_external_savepath);
Nigel Ren
  • 56,122
  • 11
  • 43
  • 55
  • @Nige_Ren I'm trying your first code. I need to know how to save the new xml with other name. – JPashs Jan 29 '19 at 17:50
  • If you mean save the XML to a file, you can just save the data using `file_put_contents("outputFileName.xml", extractData ($xml_external_path, "Inmuebles"));` – Nigel Ren Jan 29 '19 at 18:00
  • @Nige_Ren thanks, where exactly do I insert this line. Can you wrap the complete function? – JPashs Jan 29 '19 at 18:13
  • I've added a new version of the function where you can pass the file name to save the result to. – Nigel Ren Jan 29 '19 at 18:18
  • @Nige_Ren thanks, your last code `function extractData ( $sourceFile, $table, $newFileName )...` works fine. – JPashs Jan 30 '19 at 08:25
  • @Nige_Ren I run your last code to change this xml http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml to this http://www.csainmobiliaria.com/pisos-NEW-2.xml. I worked fine. But I still need to make some text replacements. You will see it in this capture: https://postimg.cc/HcGXMGfW How can I do this? Can you add it in your last code `function extractData ( $sourceFile, $table, $newFileName )...` ? Thanks. – JPashs Jan 30 '19 at 08:38
  • `$table` should be escaped when added to the XPath, for much for the same reason variables in SQL queries should be escaped, see https://stackoverflow.com/a/54436185/1067003 – hanshenrik Jan 30 '19 at 08:51
  • I've added the segment of code starting with `// Update Foto elements` – Nigel Ren Jan 30 '19 at 10:05
  • Have you sorted this yet? – Nigel Ren Feb 02 '19 at 09:38
  • Yes, I get it working. Thanks. Just one more question, is there a way to add a new element? Let's say I need to add a custom element `custom text` inside each `` element. Can I do this? – JPashs Feb 03 '19 at 17:40
  • I've updated the example (around the lines `$importNode = $newFile->importNode` ) which show how you can modify the node before adding the content into the new document. – Nigel Ren Feb 03 '19 at 17:54
  • @NigelRen Thank you. Can you please help with this?: I need to print a single node which is inside the the `...` node, this one: `100002`, I need to print the value `100002`. Can I do this? – JPashs Feb 13 '19 at 08:38
  • Probably better to ask a new question as I'm out for a while and you may get help from others. – Nigel Ren Feb 13 '19 at 08:42
0

Consider again, XSLT, the W3C standards compliant, special-purpose language designed to modify XML files to needed user specification such as your #1-3 needs. Like the other popular declarative language, SQL, XSLT is not limited to PHP but portable to other application layers (Java, C#, Python, Perl, R) and dedicated XSLT 1.0, 2.0, and 3.0 .exe processors.

With this approach, XSLT's recursive styling allows you to avoid any foreach looping, if logic, and repeated lines like addChild or appendChild calls at the application layer.

XSLT (save as an .xsl file, a special .xml file, or embedded string; portable to other interfaces beyond PHP)

<?xml version="1.0"?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" indent="yes" encoding="ISO-8859-1"/>
     <xsl:strip-space elements="*"/>

     <!-- WALK DOWN TREE FROM ROOT -->
     <xsl:template match="Publication">
        <xsl:apply-templates select="Table"/>
     </xsl:template>

     <xsl:template match="Table[@Name='Inmuebles']">
         <Inmuebles>
             <xsl:apply-templates select="*"/>
         </Inmuebles>
     </xsl:template>

     <!-- EMPTY TEMPLATE TO REMOVE SPECIFIED NODES -->
     <xsl:template match="Table[@Name='Agencias']"/>

     <!-- RETURN ONLY FIRST FIVE NODES -->
     <xsl:template match="Table/*">
         <Inmuebles>
             <xsl:copy-of select="*[position() &lt;= 5]"/>
         </Inmuebles>
     </xsl:template>

</xsl:stylesheet>

XSLT Demo

PHP (using the php_xsl library)

// LOAD XML SOURCE
$url = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$web_data = file_get_contents($url);
$xml = new SimpleXMLElement($web_data);

// LOAD XSL SCRIPT
$xsl = simplexml_load_file('/path/to/script.xsl');

// XSLT TRANSFORMATION
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl); 
$newXML = $proc->transformToXML($xml);

// OUTPUT TO CONSOLE
echo $newXML;

// SAVE TO FILE
file_put_contents('Output.xml', $newXML);

And as the great XSLT guru, @Dimitre Novatchev, usually ends his posts: the wanted, correct result is produced:

<?xml version="1.0" encoding="ISO-8859-1"?>
<Inmuebles>
   <Inmuebles>
      <IdInmobiliariaExterna>B45695855</IdInmobiliariaExterna>
      <IdPisoExterno>100002</IdPisoExterno>
      <FechaHoraModificado>30/11/2018</FechaHoraModificado>
      <TipoInmueble>PISO</TipoInmueble>
      <TipoOperacion>3</TipoOperacion>
   </Inmuebles>
   <Inmuebles>
      <IdInmobiliariaExterna>B45695855</IdInmobiliariaExterna>
      <IdPisoExterno>100003</IdPisoExterno>
      <FechaHoraModificado>30/11/2018</FechaHoraModificado>
      <TipoInmueble>CHALET</TipoInmueble>
      <TipoOperacion>4</TipoOperacion>
   </Inmuebles>
</Inmuebles>
Parfait
  • 104,375
  • 17
  • 94
  • 125