2

XML structure:

<channel>
<title>
</title> 
<item>
<description>
<tbody>
<tr>
<td class="chart_stock_name">NUGT</td>
<td class="chart_stock_price">5.86</td>
<td class="chart_stock_change">+1.02</td>
<td class="chart_stock_prc">+(21.07%)</td>
</tr>
</description> 
</item> 
</channel> 

CODE:

$xml = simplexml_load_file($url);
for($i = 0; $i <= 6; $i++) {
$variablename = $xml->channel->item->description;
}

--

I need to get each of the values within the td classes. The best I have been able to manage is to echo out the description, which has all the rows.

What is the correct way to get the values within a td class? i.e. "NUGT"

UPDATE ( FULL XML ):

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="/res/preview.xsl"?>
<rss version="2.0">
  <channel>
    <title>Price &#37; Gainers</title>
    <link>http://finance.yahoo.com/gainers?e=nq</link>
    <description><![CDATA[The list of gainers by % in the NYSE]]></description>
    <lastBuildDate>Mon, 01 Jul 2013 13:46:05 GMT</lastBuildDate>
    <generator>Feed43 Proxy/1.0 (www.feed43.com)</generator>
    <ttl>360</ttl>

<item>
<guid isPermaLink="false">e5a4b37b270cb65b3039e1d7c0152a88</guid>
<pubDate>Mon, 01 Jul 2013 13:46:05 GMT</pubDate>
<title>Stock Gainers</title>
<link>http://finance.yahoo.com/q?s=REMX</link>
<description><![CDATA[<table class="chart_stock">
<thead>
<tr>
<th>Name</th>
<th>Price</th>
<th>Change</th>
<th>% Chg</th>
</tr>
</thead>
<tbody><tr><td class="chart_stock_name">REMX</td><td class="chart_stock_price">39.00</td><td class="chart_stock_change">+29.51</td><td class="chart_stock_prc">+(310.96%)</td></tr><tr><td class="chart_stock_name">GDXJ</td><td class="chart_stock_price">36.99</td><td class="chart_stock_change">+27.83</td><td class="chart_stock_prc">+(303.82%)</td></tr><tr><td class="chart_stock_name">GEX</td><td class="chart_stock_price">51.45</td><td class="chart_stock_change">+36.23</td><td class="chart_stock_prc">+(238.04%)</td></tr><tr><td class="chart_stock_name">LVB</td><td class="chart_stock_price">35.04</td><td class="chart_stock_change">+4.61</td><td class="chart_stock_prc">+(15.15%)</td></tr><tr><td class="chart_stock_name">NUGT</td><td class="chart_stock_price">6.36</td><td class="chart_stock_change">+0.50</td><td class="chart_stock_prc">+(8.48%)</td></tr><tr><td class="chart_stock_name">FWDI</td><td class="chart_stock_price">23.88</td><td class="chart_stock_change">+0.00</td><td class="chart_stock_prc">+(0.00%)</td></tr><tr><td class="chart_stock_name">LAS</td><td class="chart_stock_price">5.11</td><td class="chart_stock_change">+0.31</td><td class="chart_stock_prc">+(6.46%)</td></tr><tr><td class="chart_stock_name">WWAV-B</td><td class="chart_stock_price">16.10</td><td class="chart_stock_change">+0.90</td><td class="chart_stock_prc">+(5.92%)</td></tr><tr><td class="chart_stock_name">LPLT</td><td class="chart_stock_price">31.78</td><td class="chart_stock_change">+0.00</td><td class="chart_stock_prc">+(0.00%)</td></tr><tr><td class="chart_stock_name">P</td><td class="chart_stock_price">19.43</td><td class="chart_stock_change">+1.03</td><td class="chart_stock_prc">+(5.59%)</td></tr><tr><td class="chart_stock_name">MUX</td><td class="chart_stock_price">1.78</td><td class="chart_stock_change">+0.10</td><td class="chart_stock_prc">+(5.95%)</td></tr><tr><td class="chart_stock_name">NOK</td><td class="chart_stock_price">3.94</td><td class="chart_stock_change">+0.20</td><td class="chart_stock_prc">+(5.21%)</td></tr><tr><td class="chart_stock_name">AUQ</td><td class="chart_stock_price">4.59</td><td class="chart_stock_change">+0.22</td><td class="chart_stock_prc">+(5.03%)</td></tr><tr><td class="chart_stock_name">UWTI</td><td class="chart_stock_price">30.81</td><td class="chart_stock_change">+1.44</td><td class="chart_stock_prc">+(4.90%)</td></tr><tr><td class="chart_stock_name">DRD</td><td class="chart_stock_price">5.69</td><td class="chart_stock_change">+0.26</td><td class="chart_stock_prc">+(4.79%)</td></tr><tr><td class="chart_stock_name">AUO</td><td class="chart_stock_price">3.62</td><td class="chart_stock_change">+0.16</td><td class="chart_stock_prc">+(4.62%)</td></tr><tr><td class="chart_stock_name">SLVP</td><td class="chart_stock_price">11.88</td><td class="chart_stock_change">+0.52</td><td class="chart_stock_prc">+(4.58%)</td></tr><tr><td class="chart_stock_name">IAG</td><td class="chart_stock_price">4.38</td><td class="chart_stock_change">+0.18</td><td class="chart_stock_prc">+(4.16%)</td></tr><tr><td class="chart_stock_name">EGO</td><td class="chart_stock_price">6.42</td><td class="chart_stock_change">+0.24</td><td class="chart_stock_prc">+(3.88%)</td></tr><tr><td class="chart_stock_name">GNK</td><td class="chart_stock_price">1.63</td><td class="chart_stock_change">+0.00</td><td class="chart_stock_prc">+(0.00%)</td></tr><tr><td class="chart_stock_name">TKC</td><td class="chart_stock_price">14.98</td><td class="chart_stock_change">+0.61</td><td class="chart_stock_prc">+(4.24%)</td></tr><tr><td class="chart_stock_name">TAHO</td><td class="chart_stock_price">14.73</td><td class="chart_stock_change">+0.58</td><td class="chart_stock_prc">+(4.10%)</td></tr><tr><td class="chart_stock_name">BALT</td><td class="chart_stock_price">3.86</td><td class="chart_stock_change">+0.15</td><td class="chart_stock_prc">+(4.04%)</td></tr><tr><td class="chart_stock_name">TGEM</td><td class="chart_stock_price">19.01</td><td class="chart_stock_change">+0.72</td><td class="chart_stock_prc">+(3.94%)</td></tr><tr><td class="chart_stock_name">MMD</td><td class="chart_stock_price">18.69</td><td class="chart_stock_change">+0.70</td><td class="chart_stock_prc">+(3.89%)</td></tr></tbody>
</table>]]></description>
</item>


  </channel>
</rss>
dev
  • 21
  • 1
  • 2

1 Answers1

2

I'll note first that your XML isn't entirely valid as you have it above - the closing tag is missing for the <tbody>.

An easy method is to use an XPath query which returns the <td> nodes having a class that contains chart_stock. From there, you may loop over them and retrieve each node value, constructing an array whose keys are the chart_stock_* classes and values are the corresponding node values.

The guts here happen in the XPath query.

  • //td selects all <td> nodes...
  • contains(@class, "chart_stock") ... which have "chart_stock" in their class attribute.

// Load your file
$xml = simplexml_load_file($url);
// Get all the <td> nodes via xpath, 
// only those containing chart_stock in the class
$tds = $xml->xpath('//td[contains(@class, "chart_stock")]');

// An array to hold your values...
$output = array();

// Loop over them and build an array of key => value pairs
// based on the class attribute
foreach ($tds as $td) {
  $attr = $td->attributes();
  // Cast attribute and node as strings and assign to your array
  $class = (string)$attr['class'];
  $output[$class] = (string)$td;
}

print_r($output);
Array
(
    [chart_stock_name] => NUGT
    [chart_stock_price] => 5.86
    [chart_stock_change] => +1.02
    [chart_stock_prc] => +(21.07%)
)

After seeing your real XML, it appears this is much different than your original sample implied. The <description> nodes each contain a CDATA block of HTML. SimpleXML isn't so good with HTML, and instead DOMDocument is stronger there. Using DOMDocument, retrieve the <description> nodes then load their contents as HTML. Using DOM API calls like getElementsByTagName, loop over the <tr><td> and load the rows onto $output, a new sub-array for each <tr>.

// An array to hold your values...
$output = array();

// DOMDocument for the outer XML    
$maindom = new DOMDocument();
$maindom->loadXML($xmltext);

// Loop over description nodes
$desc = $maindom->getElementsByTagName('description');

foreach ($desc as $d) {
    // get the cdata block
    $cdata = $d->nodeValue; 
    // and load it as HTML into DOMDocument
    $dom = new DOMDocument();
    $dom->loadHTML($cdata);
    // Get its descendant <tr>
    $trs = $dom->getElementsByTagName("tr");

    // Loop over each <tr> and get ids child <td> to retrieve your values
    foreach ($trs as $tr) {
        // New output sub-array per tr
        $row = array();
        $tds = $tr->getElementsByTagName('td');
        foreach ($tds as $td) {
                // Load each <td> onto the current row array by class
            $class = $td->getAttribute('class');
            $row[$class] = $td->nodeValue;
        }
        // Append to $output
        $output[] = $row;
    }
}
echo '<pre>';
print_r($output);
echo '</pre>';

Here is the whole thing in action

Michael Berkowski
  • 267,341
  • 46
  • 444
  • 390
  • It looks great, but I'm not getting any data in the $output array – dev Jul 01 '13 at 04:21
  • @dev If your XML resembles what you have above, it will work - see http://codepad.viper-7.com/hJOeI6 I did have to add the closing `` to make it valid though. – Michael Berkowski Jul 01 '13 at 11:25
  • The XML output above was a summary of the xml file. More specifically, I am using this file: feed43.com/5787633116607318.xml. My print_r($output) is "Array". I am using the exact same code. – dev Jul 01 '13 at 12:11
  • @dev Please paste a more complete sample of your XML into your question above. When I try to retrieve the URL in your comment, I get a mostly empty XML structure having only an error message in the `` – Michael Berkowski Jul 01 '13 at 13:26
  • @dev Updated. Your XML is actually very different than your original sample implies. – Michael Berkowski Jul 01 '13 at 15:26
  • That is perfect. Thank you for taking the time to do that – dev Jul 01 '13 at 15:35
  • Although SimpleXML doesn't have a function to load HTML, it can work on a DOM object loaded from HTML: `$dom->loadHTML($html); $sxml = simplexml_load_dom($dom);` (This doesn't re-parse the input, it just re-wraps the parsed structure in a different PHP API.) – IMSoP Jul 01 '13 at 15:42