0

I want my php-script to download files from a specific link based on xml id's. I want it to ignore the rest of the xml-code, I want it to just look at the first id of every lib.

My xml looks like this:

**

<lib id="ITEM_I_WANT_TO_DOWNLOAD_1" revision="0000">
    <part id="0000" type="ch"/>
    <part id="0000" type="ls"/>
    <part id="0000" type="rs"/>
    <part id="0000" type="ch"/>
  </lib>
<lib id="ITEM_I_WANT_TO_DOWNLOAD_2" revision="0000">
    <part id="0000" type="ch"/>
    <part id="0000" type="ls"/>
    <part id="0000" type="rs"/>
    <part id="0000" type="ch"/>
  </lib>

**

My current PHP-script looks like this:

    if (!defined('STDIN'))
  {
      echo 'Please run it as a cmd ({path to your php}/php.exe {path to badges.php} -f)';
      exit;
  }
  define('BASE', 'https://randomtarget.com/');
  $figuremap = get_remote_data('https://random/xmlfile-needed.xml/');

  if (!file_exists('C:/outputfolder/')) {
    mkdir('C:/outputfolder/', 0777, true);
      echo "\n --------------> Output folder has been made... \n";

    sleep(3);

    $fp = fopen("C:/downloaded-xmlfile.xml", "w");
      fwrite($fp, $figuremap);
      fclose($fp);
    echo "\n --------------> XML downloaded and placed into folder \n";

    sleep(3);
  }
  $pos = 0;
  while ($pos = strpos($figuremap, '<lib id="', $pos +1))
  {
      $pos1 = strpos($figuremap, '"', $pos);
      $rule = substr($figuremap, $pos, ($pos1 -$pos));
      $rule = explode(',', $rule);
      $revision = str_replace('">', '', $rule[1]);
      $clothing_file = current(explode('*', str_replace('"', '', $rule[2])));
      if (file_exists('C:/outputfolder/'.$clothing_file.'.swf'))
      {
          echo 'Clothing_file found: '.$clothing_file."\r\n";
          continue;
      }
      echo 'Download clothing_file: '.$clothing_file.' '.$revision."\r\n";

      if (!@copy(BASE.'/'.$revision.'/'.$clothing_file.'.swf', 'C:/outputfolder'.$clothing_file.'.swf'))
      {
          echo 'Error downloading: '.$clothing_file."\r\n";
      }
  }

Beside this code I wrote a get_remote_data function so that's allright. I just want the strpos to grab all the id='' items to check if the files exist on the target-site.

How can I fix it?

Deko
  • 3
  • 2
  • Either use a xml handling library (SimpleXML for example) or at least a regular expression to parse the XML. – arkascha Oct 28 '18 at 19:41
  • @Amessihel I did not claim that he has to and cannot succeed otherwise. I suggested to use a tool better suited for the task. A question of coding style and maintainability of code. – arkascha Oct 29 '18 at 06:58

2 Answers2

0

There are some easy ways of processing XML files, the easiest (but less flexible) is SimpleXML, the following code should replace the main processing loop...

$xml = simplexml_load_string($figuremap);

foreach ( $xml->lib as $lib )   {
      $clothing_file = (string) $lib['id'];

      if (file_exists('C:/outputfolder/'.$clothing_file.'.swf'))
      {
          echo 'Clothing_file found: '.$clothing_file."\r\n";
          continue;
      }
      echo 'Download clothing_file: '.$clothing_file.' '.$revision."\r\n";

      if (!@copy(BASE.'/'.$revision.'/'.$clothing_file.'.swf', 'C:/outputfolder'.$clothing_file.'.swf'))
      {
          echo 'Error downloading: '.$clothing_file."\r\n";
      }
}

The start point is to load the XML you have in $figuremap into SimpleXML, then to loop over the elements. This assumes an XML structure of something like...

<lib1>
    <lib id="ITEM_I_WANT_TO_DOWNLOAD_1" revision="0000">
        <part id="0000a" type="ch" />
        <part id="0000" type="ls" />
        <part id="0000" type="rs" />
        <part id="0000" type="ch" />
    </lib>
    <lib id="ITEM_I_WANT_TO_DOWNLOAD_2" revision="0000">
        <part id="00001" type="ch" />
        <part id="0000" type="ls" />
        <part id="0000" type="rs" />
        <part id="0000" type="ch" />
    </lib>
</lib1>

The actual name of the base element doesn't matter as long as the <lib> elements are 1 level down then you can use $xml->lib to loop over them.

Nigel Ren
  • 56,122
  • 11
  • 43
  • 55
  • Thank you for responding, it is working like it should. But instead I want it to grab id="ITEM_I_WANT_TO_DOWNLOAD_1", and ignoring all the parts. Is this also possible? I get the idea of how this works, but I can't seem to find a way to grab the first id of the lib and continue to id="ITEM_I_WANT_TO_DOWNLOAD_2". – Deko Oct 28 '18 at 20:11
0

Your posted xml string is actually invalid. It needs to be wrapped in a parent element to be repaired. I'm not sure if you are posting your exact xml string or just a section of it.

$xml = '<lib id="ITEM_I_WANT_TO_DOWNLOAD_1" revision="0000">
    <part id="0000" type="ch"/>
    <part id="0000" type="ls"/>
    <part id="0000" type="rs"/>
    <part id="0000" type="ch"/>
  </lib>
<lib id="ITEM_I_WANT_TO_DOWNLOAD_2" revision="0000">
    <part id="0000" type="ch"/>
    <part id="0000" type="ls"/>
    <part id="0000" type="rs"/>
    <part id="0000" type="ch"/>
  </lib>';

$xml = '<mydocument>' . $xml . '</mydocument>';  // repair invalid xml
https://stackoverflow.com/q/4544272/2943403

$doc = new DOMDocument();
$doc->loadXml($xml);
$xpath = new DOMXpath($doc);
foreach ($xpath->evaluate('//lib/@id') as $attr) {
    $clothing_file = $attr->value;
    // perform your conditional actions ...
}

//lib/@id says search for the id attribute of all <lib> elements, anywhere in the document.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136