how to parse contents from a html file using CURL?

Question

I want to parse an XHTML content using CURL. How to scrap transaction number, weight, height, Width between <table> tags. How to scrap only the contents from this HTML document and get it as array using CURL?

transactions.php

 <table border=0 cellspacing=0 width=100%>
       <tr> 
        <td colspan="2">&nbsp;</td>
      </tr>
      <tr> 
        <td width="30%" class="Mellemrubrikker">Transaction Number::</td>
        <td width="70%">24752734576547IN</td>
      </tr>
      <tr> 
        <td width="30%" class="Mellemrubrikker">Weight:</td>
        <td width="70%">0.85 kg</td>
      </tr>
      <tr> 
        <td width="30%" class="Mellemrubrikker">Length:</td>
        <td width="70%">543 mm.</td>
      </tr>
      <tr> 
        <td width="30%" class="Mellemrubrikker">Height:</td>
        <td width="70%">156 mm.</td>
      </tr>
      <tr> 
        <td width="30%" class="Mellemrubrikker">Width:</td>
        <td width="70%">61 mm.</td>
      </tr>
      <tr> 
         <td colspan="2">&nbsp;</td>
      </tr>    
    </table>

index.php

<?php
$url = "http://localhost/htmlparse/transactions.php";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
//print_r($output);
echo $output;
?>

This code gets whole html content from transactions.php . How to get data between <table> as an array value ?

this is not a do my work for me site. what have you tried, and what didn't work as you expected? — Randy, Jul 26 '11 at 12:09
yes i try using a curl,but i am not familiar with preg_match. — Balaji Kandasamy, Jul 26 '11 at 12:16
Regarding parsing HTML using regexes, see ["RegEx match open tags except XHTML self-contained tags"](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). — outis, Jul 28 '11 at 05:44
Thanks for your question! However, this is not really the kind of question that Stack Overflow is here to answer. [Read this for more information](http://stackoverflow.com/faq#dontask) Once you have a specific question about a specific problem you are having with code you are writing, feel free to return. — Andrew Barber, Mar 24 '13 at 02:02
@ Andrew Barber: hi, I've added the curl code that I used to parse html. It retrieves whole contentwith tags from the file. I want to get data only. How can retrieve it as array result ? — Balaji Kandasamy, Mar 25 '13 at 07:37

score 3 · Answer 1 · answered Jul 26 '11 at 12:12

3

Try simple html dom from http://simplehtmldom.sourceforge.net/

If you don't mind to use python or perl you can use beautifulsoup or WWW-Mechanize

answered Jul 26 '11 at 12:12

Poomalairaj

4,888
3
23
27

came here to suggest the same. :) – iHaveacomputer Jul 27 '11 at 04:46

score 1 · Answer 2 · edited May 23 '17 at 12:22

1

I would use the Document Object Model rather than writing your own parsing code or (God forbid!) regular expressions.

Here's an example in PHP: PHP Parse HTML code

edited May 23 '17 at 12:22

Community

1
1

answered Jul 26 '11 at 12:11

Philip

694
5
14

how to parse contents from a html file using CURL?

2 Answers2

Linked