1

I have an HTML table that I would like to parse in PHP to store into a MySQL Database. The HTML looks like this:

<tr><td>DATE</td><td>LOCATION</td><td><a href="URL">NAME</a></td></tr>

I would like to create a PHP function that returns in an array, the fields in capital letters. Does anyone know any php libraries that can do this, or should I be using a different language, as this may be complex. I don't know exactly how to do this with many tables on the page, but I am trying to parse the VEX events on RobotEvents. The table that I want to parse starts at line 465.

John
  • 2,015
  • 5
  • 23
  • 37
  • I am downloading the HTML file. – John Dec 21 '13 at 23:51
  • Have you looked at this, it might be helpful. http://stackoverflow.com/questions/8816194/how-to-parse-html-table-using-php – Ali Gajani Dec 21 '13 at 23:51
  • @Smith: See my updated answer for finding a specific table. – TheCarver Dec 22 '13 at 00:39
  • @Smith. Noticed you switched the accepted answer from mine to another. Is there a reason why? Did you find a problem with the library I suggested. Just curious to know what went wrong in case I have to recommend it to somebody else in future. – TheCarver Dec 22 '13 at 20:58
  • Your code didn't work as well as the other libraries suggested. – John Dec 22 '13 at 22:42

2 Answers2

2

Take a look at the PHP HTML DOM Parser library.

To use, you can do something similar to this (not my example):

require('simple_html_dom.php');

$table = array();

$html = file_get_html('http://flow935.com/playlist/flowhis.HTM');
foreach($html->find('tr') as $row) {
    $time = $row->find('td',0)->plaintext;
    $artist = $row->find('td',1)->plaintext;
    $title = $row->find('td',2)->plaintext;

    $table[$artist][$title] = true;
}

echo '<pre>';
print_r($table);
echo '</pre>';

There's some tutorials, SO questions and interesting reads about the library. It seems to be pretty popular.

UPDATE FOR FINDING SPECIFIC TABLE IN HTML USING ABOVE LIBRARY

To find a particular table amongst many:

1. By class:

On line 465 of your scraped HTML, the table starts with a class catalog-listing, so:

foreach ($html->find('table[@class="catalog-listing"]')->find('tr') as $row) {
   // extract TD data
}

2. By instance (find 2nd table in HTML)

foreach ($html->find('table', 2)->find('tr') as $row) {
   // extract TD data
}
Community
  • 1
  • 1
TheCarver
  • 19,391
  • 25
  • 99
  • 149
0

As you're prepared to look beyond PHP, Nokogiri (Ruby) and Beautiful Soup (Python) are well-established libraries that parse HTML very well.

That doesn't imply that there are no suitable PHP libraries.

joews
  • 29,767
  • 10
  • 79
  • 91