1

I solved my initial issue with the below code. I now need to learn how to limit the returned data to the first 5 rows. How do I limit a foreach loop?

I am scraping data from a site - I am able to traverse the DOM to get the table I want "LAST 1 MONTH (11/20/2017-12/19/2017)" which is the 3rd one or "2". However, I can't quite get the output correct. I need to wrap it in a table, with each row containing the td's specified in the code. Here is the code I am using w/limited success:

<?php
    $html = file_get_contents('https://ninjatrader.isystems.com/Systems/TopStrategies'); 
    $doc = new DOMDocument();
    @$doc->loadhtml($html);
    $xpath = new DOMXPath($doc);

    echo "<table>";
    foreach($xpath->query('//table')->item(2)->getElementsByTagName('tr') as $rows) {
    $cells = $rows->getElementsByTagName('td');

    echo "<tr>
            <td>" . $cells->item(1)->textContent . "</td>
                <td>" . $cells->item(2)->textContent . "</td>
                <td>" . $cells->item(3)->textContent . "</td>
                <td>" . $cells->item(5)->textContent . "</td>
            </tr>";
    }
    echo "</table>";
?>

OK, I've pretty much solved my issue with the above. Is there a better way to do this?

Dirty Bird Design
  • 5,333
  • 13
  • 64
  • 121
  • 1
    @ suppresses errors. So if loadhtml were to have any issues, it wouldn't push them to the browser (or the screen if you are doing command line stuff) -- see https://stackoverflow.com/questions/2002610/character-before-a-function-call – DragonYen Dec 20 '17 at 22:37
  • Thanks @DragonYen, so it is a simplified way to do -> libxml_use_internal_errors(TRUE); ? – Dirty Bird Design Dec 20 '17 at 22:54

1 Answers1

1

You can access the tag name through the nodeName property, and then add in the other parts of the tags as strings to the output.

echo "<" . $cells->item(1)->nodeName . ">";
echo $cells->item(1)->textContent;
echo "</" .  $cells->item(1)->nodeName . ">";

A more eloquent approach for "td" elements:

for($i = 1; $i < 6; $i++)
{        
    if ($i != 4 && $cells->length > 4) {
         echo "<td>" . $cells->item($i)->textContent . "</td>";
    }
}

For the main loop I'd write it like this, and output each element to a new line. Remove "\n" if new line not required. You can limit the foreach loop using the array key as $index. As row 0 is empty in this case it's $index < 6 to get the first 5 rows. If row 0 had data, you could use $index < 5

$rows = $xpath->query('//table')->item(2)->getElementsByTagName('tr');
echo "<table>\n";
foreach($rows as $index => $row) {
  $cells = $row->getElementsByTagName('td');
  if ($cells->length > 4 && $index < 6) {
    echo "<tr>\n";
    for($i = 1; $i < 6; $i++)
    {        
      if ($i != 4) {
        echo "<td>" . $cells->item($i)->textContent . "</td>\n";
      }
    }
    echo "</tr>\n";
  }
}
echo "</table>\n";

References:

http://php.net/manual/en/class.domxpath.php

http://php.net/manual/en/control-structures.for.php

http://php.net/manual/en/control-structures.foreach.php

Matts
  • 1,301
  • 11
  • 30