So I am trying to parse an HTML table, pass it to a PHP array, then send to MySQL.
This is what I have so far:
<?php
$url = 'website here';
$html = file_get_contents($url);
libxml_use_internal_errors(true);
// a new dom object
$dom = new domDocument;
// load the html into the object
$dom->loadHTML($html);
// discard white space
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
//foreach($tables as $table)
//{
// echo $dom->saveHTML($table);
//}
$a = array();
foreach($tables as $k => $table) {
foreach($table->getElementsByTagName('tr') as $td) {
$a['table_' . $k][] = array_values(array_filter(explode(' ', str_replace(array("\n", "\r"), "", $td->nodeValue))));
}
}
$cell_1 = $a['table_0'][1][0]; // read first cell [0] of first row of data [1]
$value= preg_replace('/[\$,]/', '', $cell_1); //Remove dollar sign and commas
//$test = $value * 5;
echo $value
?>
After parsing the table in entirety looks like this:
| Header 1 | Header 2 | Header 3 |
|:-----------|------------:|:----------:|
| $20,000 | $50,000 | $50,000 |
| $40,000 | $20,000 | $90,000 |
| $60,000 | $30,000 | $580,000 |
| $80,000 | $10,000 | $410,000 |
....you get the idea. All data is space delimited, although the headers could have multiple words.
Running the following allows me to access the first cell in the first row of actual data:
$cell_1 = $a['table_0'][1][0]; // read first cell [0] of first row of data [1]
$value= preg_replace('/[\$,]/', '', $cell_1); //Remove dollar sign and commas
echo $value
....but this could prove tedious considering some tables have different number of rows (all have the same number of columns, however). Side note: only one table per HTML page.
The question is, how can I simplify this and loop through each cell for each row? Ultimately, the end goal will be the MySQL database having a carbon copy of the table. And each value needs to be stripped of commas etc. so I can perform calculations later.
Edit:
Okay, this is what I have to send it to MySQL (appended to bottom of code above). Any recommendations for improvement would be welcomed. I am getting a deprecation error obviously but I am clueless with PDO.
unset($a['table_0'][0]); // remove table header
$json = json_encode($a['table_0'], JSON_UNESCAPED_UNICODE);
// Make the queries
$arr = json_decode($json, true);
$dbhost = 'localhost';
$dbuser = 'root';
$dbpass = 'password';
$conn = mysql_connect($dbhost, $dbuser, $dbpass);
if(! $conn )
{
die('Could not connect: ' . mysql_error());
}
echo 'Connected successfully<br />';
$sql = "CREATE TABLE tutorials_tbl( ".
"row_id INT NOT NULL AUTO_INCREMENT, ".
"new INT NOT NULL , ".
"old INT NOT NULL , ".
"done INT NOT NULL , ".
"PRIMARY KEY ( row_id )); ";
mysql_select_db( 'tutorial' );
$retval = mysql_query( $sql, $conn );
if(! $retval )
{
die('Could not create table: ' . mysql_error());
}
echo "Table created successfully\n";
foreach($arr as $item){
$new= $item[0]; // column 1 of data
$old = $item[1]; // column 2 of data
$done = $item[2]; // column 3 of data
mysql_query("INSERT INTO tutorials_tbl (new, old, done) VALUES('$new', '$old', '$done')") or die (mysql_error());
}
mysql_close($conn);