I have a script that curls a webpage and pulls out a table. I have gotten it to the point where
echo "<table>";
echo $table;
echo "</table>;
will output the table I am looking for, but what I need is for it to be in an array, so I can look at every row individually. The first row of the table has the column names, if that makes things any easier. Below is the format of the table:
<pre>
<table>
<tbody id="sortable1">
<tr id="skip_coloring" class="nosort">
<tr>
<td class="border_even" style="white-space:nowrap">06/20/2011 4:33 PM </td>
<td class="border_even">
<strong>user_name, ext</strong>
</td>
<td class="border_even"> outside_num </td>
<td class="border_even"> outgoing </td>
<td class="border_even"> 12m, 14s </td>
<td class="border_even"> 12m, 5s </td>
<tr>
<tr>
</tbody>
</table>
</pre>
id=skip_coloring
has the column names. All other rows are data. Im using preg_match to get the table; if there is a better way to do it, let me know. Right now, I am using the following preg_match
to get this table:
preg_match('#<table[^>]*id="row1"[^>]*>(.+?)<\\/table>#is', $cres_data, $matches);
but $matches
is an array with 2 indexes, one for each page of results that the table creates. Maybe it would be better to try and match against each row within the table? I seem to recall that this could be done with simplexml or something, but I haven't gotten there yet. Any help is appreciated.
edit
Ended up using DOM; here's what I've got now:
$dom = new DomDocument();
$dom->loadHTML($cres_data);
$xpath = new DOMXPath($dom);
//get the first row of values
$arr = array();
foreach ($xpath->query('//tbody[@id="sortable1"]/tr/td') as $node)
{
$arr[] = $node->nodeValue;
}
echo '<pre>';
print_($arr);
The output, however, isn't quite right:
Array
(
[0] =>
Call Date
[1] =>
Call From
.
.
.
[7] =>
06/20/2011 4:33 PM
[8] =>
user_name <ext>
Is there some way to remove all the whitespace, and get the column names as index labels? I assume I'd need to do this twice, or embed an additional foreach...it will always be formatted the same, if that matters.
edit
Used this function on both the labels and data to properly format it:
$label_arr = array_filter(array_map('trim',$label_arr));
Output was exactly what I need.
Array
(
[Call Date] => 06/20/2011 4:33 PM
[Call From] => user_name <ext>
[Call To] => outside_num
[Call Type] => outgoing
[Call Time] => 12m, 14s
[Talk Time] => 12m, 5s
)