1

I try to pre-sort and slice a big XML file for later processing via xml_parser

    function CreateXMLParser($CHARSET, $bareXML = false) {
      $CURRXML = xml_parser_create($CHARSET);
      xml_parser_set_option( $CURRXML, XML_OPTION_CASE_FOLDING, false);
      xml_parser_set_option( $CURRXML, XML_OPTION_TARGET_ENCODING, $CHARSET);
      xml_set_element_handler($CURRXML, 'startElement', 'endElement');
      xml_set_character_data_handler($CURRXML, 'dataHandler');
      xml_set_default_handler($CURRXML, 'defaultHandler');
      if ($bareXML) {
         xml_parse($CURRXML, '<?xml version="1.0"?>', 0);
        }               
      return $CURRXML;
      }

    function ChunkXMLBigFile($file, $tag = 'item', $howmany = 1000) {
         global $CHUNKON, $CHUNKS, $ITEMLIMIT;  

         $CHUNKON   = $tag;
         $ITEMLIMIT = $howmany; 
         $xml = CreateXMLParser('UTF-8', false);

         $fp = fopen($file, "r");
         $CHUNKS  = 0;
         while(!feof($fp)) {
              $chunk = fgets($fp, 10240);                     
              xml_parse($xml, $chunk, feof($fp));
         }
         xml_parser_free($xml);              
         processChunk();
    }
function processChunk() {
    global $CHUNKS, $PAYLOAD, $ITEMCOUNT;
    if ('' == $PAYLOAD) {
        return;
        }

    $xp = fopen($file = "xmlTemp/slices/slice_".$CHUNKS.".xml", "w");
    fwrite($xp, '<?xml version="1.0" ?>'."\n");
    fwrite($xp, "<producten>");
    fwrite($xp, $PAYLOAD);
    fwrite($xp, "</producten>");
    fclose($xp);
    print "Written ".$file."<br>";
    $CHUNKS++;
    $PAYLOAD    = '';
    $ITEMCOUNT  = 0;
    }



 function startElement($xml, $tag, $attrs = array())  {
    global $PAYLOAD, $CHUNKS, $ITEMCOUNT, $CHUNKON;

    if (!($CHUNKS||$ITEMCOUNT)) {
        if ($CHUNKON == strtolower($tag)) {
            $PAYLOAD = '';
            }                
        } else {
        $PAYLOAD .= "<".$tag;
        } 
    foreach($attrs as $k => $v) {
        $PAYLOAD .= " $k=".'"'.addslashes($v).'"';
        }            
    $PAYLOAD .= '>';
    }


 function endElement($xml, $tag) {
    global $CHUNKON, $ITEMCOUNT, $ITEMLIMIT;

    dataHandler(null, "<$tag>");
    if ($CHUNKON == strtolower($tag)) {
        if (++$ITEMCOUNT >= $ITEMLIMIT) {
            processChunk();
            }                
        }             
    }

 function dataHandler($xml, $data) {
    global $PAYLOAD;
    $PAYLOAD .= $data;
    }

but how can I access the node-name??

.. I have to sort some items (with n nodes) out, before the slice-file is saved. the the XML is parsed line after line, right? so I have to save the nodes from a whole item temporarely and decide if the item is gonna be written to the file.. is there a way to do this?

AdmiralCrunch
  • 153
  • 12

1 Answers1

1

Your code is effectively reading the entire source file every time you call the ChunkXMLBigFile function.

After your while loop you have all the elements, which you can then manipulate any way you like.

See the following questions about how to approach this:

If you parse the chunks after that in batches of $howmany you are where you want to be.


Tip: there are many examples online where this functionality is presented in an Object Orient Programming (OOP) approach where all the functions are inside a class. This would also eliminate the need of global variables which can cause some (read: a lot) of frustrations and confusion.

Community
  • 1
  • 1
moorscode
  • 801
  • 6
  • 13