0

Trying to get the value of Internet Data Volume Balance - the script should echo 146.30mb

New to all these, having a look at all the tutorials.

How can this be done?

<tr >
    <td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Account Status</FONT></B></div></td>
    <td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text">You exceeded your allowed credit.</FONT></div></td>
</tr> 

<tr >
    <td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Period Free Time Remaining</FONT></B></div></td>
    <td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text">0:00:00 hours</FONT></div></td>
</tr> 

<tr >
    <td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Internet Data Volume Balance</FONT></B></div></td>
    <td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text" style="text-transform:none;">146.30 MB</FONT></div></td>
</tr> 
JJJ
  • 32,902
  • 20
  • 89
  • 102
Parvesh
  • 409
  • 1
  • 7
  • 17
  • 1
    I think you'll find that while you **can** use regex to parse HTML, it's **not usually advisable**. `DOM` or `SimpleXML` will likely be much better options in this situation. –  Apr 24 '12 at 12:49
  • could you point me to a good resource? – Parvesh Apr 24 '12 at 12:50
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags SimplHTMLDom Parser is exactly what the name suggests a simple way to parse html! http://simplehtmldom.sourceforge.net/, there are quite a few other html parsing options for php too. – dm03514 Apr 24 '12 at 12:51

2 Answers2

1

PHP can interact with the DOM just like JavaScript can. This is vastly superior to parsing the markup, as most people will tell you is the wrong approach anyway:

Loading from an HTML File

// Start by creating a new document
$doc = new DOMDocument();
// I've loaded the table into an external file, and am loading it into the $doc
$doc->loadHTMLFile( 'htmlpage.html' );
// Since you have six table cells, I'm calling up all of them
$cells = $doc->getElementsByTagName("td");
// I'm grabbing the sixth cell's textContent property
echo $cells->item(5)->textContent;

This code will output "146.30 MB" to the screen.

Loading from a String

If you have the HTML stored within a string, you can load that into your document as well. We'll change the method used to load the file, into the method used to load from a string:

$str = "<table><tr><td>Foo</td></tr>...</table>";
$doc->loadHTML( $str );

We would then proceed with the same code as above to select the cells, and show their textContent in the output.

Check out the DOMDocument Class.

Community
  • 1
  • 1
Sampson
  • 265,109
  • 74
  • 539
  • 565
  • I am using curl to get the content of the page. it is a protected page. can i load the output from curl directly into loadHTMLFile ? – Parvesh Apr 24 '12 at 13:19
  • @devilived Yes, you can load the HTML from a string too. Use `$doc->loadHTML($str)` for that, where `$str` is your HTML. – Sampson Apr 24 '12 at 13:24
  • It works, thanks. But i am getting a few warnings :
    Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 295 in /home/premiu59/public_html/phpcurl/scraper.php on line 27
    
    Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: tr and table in Entity, line: 484 in /home/premiu59/public_html/phpcurl/scraper.php on line 27
    146.30 MB
    – Parvesh Apr 24 '12 at 13:35
1

If you were willing to or have already installed phpQuery, you can use that.

phpQuery::newDocumentFileHTML('htmlpage.html');
echo pq('td:eq(6)')->text();
Explosion Pills
  • 188,624
  • 52
  • 326
  • 405