0
<tr class=" odd">

<td class="pos">DATA 1</td>
<td><a href="..." target="_top">DATA 2</a></td>
<td>DATA 3</td>
<td><a href="...">DATA 4</a></td>
<td>DATA 5</td>
<td>DATA 6</td>
<td>DATA 7</td>
<td><a href="...">DATA 8</a></td>
<td>DATA 9</td>
<td>DATA 10</td>
<td style="min-width:48px"><a href="...">
<img alt="..." src="..." style="padding-right: 4px; height: 20px; width: 20px">
</a></td>

</tr>

I am looking at the code above. This code shows one row of data, I am looking at around 7500 rows of data in this format. I would like to select specific columns of data, I mainly need DATA 2, DATA 3, DATA 6 and DATA 7.

I've made code that grabs DATA 2 and DATA 3 and stores them in an array which works ok. Code below:

$array_data2 = array(); 
$array_data3 = array(); 
$array_data6 = array();

while(FALSE != $data2_pos = strpos($output, "_top",$pos + 1)) {
        $pos2 = substr($output, $data2_pos+6);     
        $pos3 = strpos($pos2, "</a>");
        $data2= substr($pos2, 0, $pos3);
        $data2=ltrim ($data2);       
        $data2=rtrim ($data2);
        $array_data2[ ] = $data2;

        $data3_pos=strpos($output,"<td>", $pos);
        $pos22 = substr($output, $data3_pos+4);     
        $pos33 = strpos($pos22, "</td>");
        $data3= substr($pos22, 0, $pos33);
        $data3=ltrim ($data3);       
        $data3=rtrim ($data3);
        $array_data3[ ] = $data3;
}

I am looking for a solution to select DATA 6. The above code works for DATA 2 because it looks for target="_top" and for DATA 3 it looks for the first table <td> tag. There does not seem to be any code that distinguishes DATA 6 from the other columns of data.

One approach that I have tried is using a fixed number of characters as below:

        $data6_pos=strpos($output,"<td>",$pos);
        $pos222 = substr($output, $data6_pos+100);     
        $pos333 = strpos($pos222, "</td>");
        $data6= substr($pos222, 0, $pos333);
        $data6=ltrim ($data6);       
        $data6=rtrim ($data6);
        $array_data6[ ] = $data6;

The disadvantage with this is that DATA 6 is not always 100 characters after the first <td> tag position.

I would be very grateful for any ideas with looking at separating sections of this data. Many thanks!

Based on the comments I have started using DOM as code below:

$dom = new DOMDocument;
$dom->loadHTML($output);
foreach ($dom->getElementsByTagName('tr') as $node) {
echo $node->nodeValue; 
echo '<br>';
}

This is much shorter code and displays the values with a new line each time there is a new <tr> table tag: DATA 1 DATA 2 DATA 3 DATA 4 DATA 5 DATA 6 DATA 7 DATA 8 DATA 9 DATA 10

I'm now going to look into how I get separate sections of this data to store individual columns such as DATA 2 and DATA 3 into separate database fields.

Jeanclaude
  • 189
  • 1
  • 4
  • 15
  • Is there a specific reason that you are not actually [parsing the HTML](http://php.net/manual/en/domdocument.loadhtml.php) and traversing the nodes? – Just a student Apr 20 '17 at 09:16
  • 2
    Don't try to parse it like that. Any change in the HTML (added/changed class name etc) will mess your code up. Use some proper tool that was built for just this instead. Here's a good SO post about it: http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php – M. Eriksson Apr 20 '17 at 09:16
  • use `preg_match()` it is more effective for parsing – diavolic Apr 20 '17 at 09:20
  • Use `strpos` always with `!==` not wiht `!=`, because it cast pos 0 to false :-) – JustOnUnderMillions Apr 20 '17 at 09:22
  • 1
    @diavolic - Using regular expressions to parse HTML should be avoided unless absolutely necessary (can't think of any situation, though). Regex is for _regular_ expressions, while HTML can be very much _irregular_ (since it isn't super strict). It _can_ be done but it usually ends up with a lot of hours just fixing errors and a massive expression you won't understand yourself after a week or two. – M. Eriksson Apr 20 '17 at 09:22
  • Hello everyone, thank you for your comments. Reading these I understand I should not be parsing like this. I am going to look into traversing the nodes. I've just read the recommended SO post. I have seen this help on the w3schools website, could I use this approach to make a simple script? https://www.w3schools.com/xml/dom_nodes_traverse.asp – Jeanclaude Apr 20 '17 at 09:43
  • The idea is the same, even though that is JavaScript and not PHP. Check out PHP's [DOMDocument class](http://php.net/manual/en/class.domdocument.php); – M. Eriksson Apr 20 '17 at 09:45
  • Thank you Magnus ah yes that is JavaScript I rather work in PHP. Thank you for the link I'm looking at PHP's DOM Document class now. Quick question, the PHP I was looking at was looking at the data in HTML, do I need to do anything special to work with the DOM? I am very new to this so sorry if this seems like a stupid question. – Jeanclaude Apr 20 '17 at 09:50

0 Answers0