0

So I am trying to parse an HTML table, pass it to a PHP array, then send to MySQL.

This is what I have so far:

<?php
$url = 'website here';
$html = file_get_contents($url);
libxml_use_internal_errors(true);
// a new dom object
$dom = new domDocument;
 
// load the html into the object
$dom->loadHTML($html); 
 
// discard white space
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
 
//foreach($tables as $table)
//{
//    echo $dom->saveHTML($table);
//}
$a = array();
foreach($tables as $k => $table) {
    foreach($table->getElementsByTagName('tr') as $td) {
        $a['table_' . $k][] = array_values(array_filter(explode(' ', str_replace(array("\n", "\r"), "", $td->nodeValue))));
    }
} 
$cell_1 = $a['table_0'][1][0]; // read first cell [0] of first row of data [1]
$value= preg_replace('/[\$,]/', '', $cell_1); //Remove dollar sign and commas
//$test = $value * 5;
echo $value
?>

After parsing the table in entirety looks like this:

| Header 1   | Header 2    | Header 3   |
|:-----------|------------:|:----------:|
| $20,000    | $50,000     | $50,000    |
| $40,000    | $20,000     | $90,000    |
| $60,000    | $30,000     | $580,000   |
| $80,000    | $10,000     | $410,000   |

....you get the idea. All data is space delimited, although the headers could have multiple words.

Running the following allows me to access the first cell in the first row of actual data:

$cell_1 = $a['table_0'][1][0]; // read first cell [0] of first row of data [1]
$value= preg_replace('/[\$,]/', '', $cell_1); //Remove dollar sign and commas
echo $value

....but this could prove tedious considering some tables have different number of rows (all have the same number of columns, however). Side note: only one table per HTML page.

The question is, how can I simplify this and loop through each cell for each row? Ultimately, the end goal will be the MySQL database having a carbon copy of the table. And each value needs to be stripped of commas etc. so I can perform calculations later.

Edit:

Okay, this is what I have to send it to MySQL (appended to bottom of code above). Any recommendations for improvement would be welcomed. I am getting a deprecation error obviously but I am clueless with PDO.

unset($a['table_0'][0]); // remove table header
$json = json_encode($a['table_0'], JSON_UNESCAPED_UNICODE);
// Make the queries
$arr = json_decode($json, true);

$dbhost = 'localhost';
$dbuser = 'root';
$dbpass = 'password';
$conn = mysql_connect($dbhost, $dbuser, $dbpass);
if(! $conn )
{
  die('Could not connect: ' . mysql_error());
}
echo 'Connected successfully<br />';
$sql = "CREATE TABLE tutorials_tbl( ".
       "row_id INT NOT NULL AUTO_INCREMENT, ".
       "new INT NOT NULL , ".
       "old INT NOT NULL , ".
       "done INT NOT NULL , ".
       "PRIMARY KEY ( row_id )); ";
mysql_select_db( 'tutorial' );
$retval = mysql_query( $sql, $conn );
if(! $retval )
{
  die('Could not create table: ' . mysql_error());
}
echo "Table created successfully\n";

foreach($arr as $item){
$new= $item[0]; // column 1 of data
$old = $item[1]; // column 2 of data
$done = $item[2]; // column 3 of data

mysql_query("INSERT INTO tutorials_tbl (new, old, done) VALUES('$new', '$old', '$done')") or die (mysql_error());
}

mysql_close($conn);
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • For each row, use an insert statement at the end of foreach loop – Metabolic Nov 10 '14 at 01:49
  • Do you want to do calculations via mysql or just to store table? – skobaljic Nov 10 '14 at 01:53
  • I think some of my calcs will have to be done by PHP so I would be happy just to get it stored for now. – Peeping Tom Nov 10 '14 at 02:03
  • Than do your preg_replace when you create array, at the end just do json_encode onto array and store into database. – skobaljic Nov 10 '14 at 02:07
  • add more foreach loops – SergeDirect Nov 10 '14 at 02:12
  • no need to use regex for this, just use combination of trim and filter input – Kevin Nov 10 '14 at 02:47
  • Ghost, could you elaborate? – Peeping Tom Nov 10 '14 at 03:36
  • This question has too many moving parts. It is asking for a review of a script that is doing webscraping and storing the parsed data in a database. There are many considerations and warnings to bake into a good answer. This task should be split, at least, in half to be a more narrow question on Stack Overflow. Related: [Parse HTML table into multidimensional array](https://stackoverflow.com/q/5685327/2943403) – mickmackusa May 11 '23 at 23:13

0 Answers0