22

I am just in need to convert a this html table automatically in csv using PHP. Can someone provide any idea how to do this? Thanks.

$table = '<table border="1">
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>';

Guys, I just need $table to convert in only .csv file, which could be automatically generated using some PHP function. We can define path for that csv file to /test/home/path_to_csv

Thompson
  • 1,954
  • 10
  • 33
  • 58
  • I searched the Google, but didn't find relevant – Thompson May 08 '12 at 12:29
  • I realize this isn't quite what you are looking for but you can save it as an xls. and excel will open it as expected. – dm03514 May 08 '12 at 12:32
  • Have you tried this? http://davidvielmetter.com/tricks/howto-convert-an-html-table-to-csv-using-php/ – Ozair Kafray May 08 '12 at 13:02
  • What exactly are you trying to achieve? Are you looking to parse the HTML and extract data or is it one-off analysis? For one-off, I would recommend information found [here](http://stackoverflow.com/questions/259091/how-can-i-scrape-an-html-table-to-csv). For more universal and automatic work, HTML needs to be parsed - try [simple parse](http://simplehtmldom.sourceforge.net/). Or, if the HTML is of a fixed format, you may have luck with regular expression to scrape the tags out and leave only the data. (e.g. replace the td with some field separators) Then, CSV output is a trivial matter. – petr May 08 '12 at 12:32
  • I am not trying to parse anything. I do have my own variable **$table**, which has the html table data in it. Now I want to convert that data to .csv file, using PHP – Thompson May 08 '12 at 12:41
  • in order to get the structure from HTML string, you do need to parse the data (stored in the variable in this case). Then, you can extract the information you are interested in and store it as a CSV – petr May 08 '12 at 12:44
  • I am getting an error that my csv is blank ..... i have a file which there is title tag and 2 3 more tags are there with the table now how i will read it and parse it into it. $file= file_get_contents($file_txt."/D_LOTMAN.HTM"); print_r($file); include './simple_html_dom.php'; $html = str_get_html($file); $fpc = fopen("downloads/me.csv", "w"); foreach($html->find('tr') as $element) { $td = array(); foreach( $element->find('td') as $row) { $td [] = $row->plaintext; } fputcsv($fpc, $td); } fclose($fpc); – Vivek May 27 '14 at 13:13

8 Answers8

26

You can use str_get_html http://simplehtmldom.sourceforge.net/

include "simple_html_dom.php";
$table = '<table border="1">
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>';

$html = str_get_html($table);



header('Content-type: application/ms-excel');
header('Content-Disposition: attachment; filename=sample.csv');

$fp = fopen("php://output", "w");

foreach($html->find('tr') as $element)
{
        $th = array();
        foreach( $element->find('th') as $row)  
        {
            $th [] = $row->plaintext;
        }

        $td = array();
        foreach( $element->find('td') as $row)  
        {
            $td [] = $row->plaintext;
        }
        !empty($th) ? fputcsv($fp, $th) : fputcsv($fp, $td);
}


fclose($fp);
Baba
  • 94,024
  • 28
  • 166
  • 217
  • Baba, how can we define some path for that csv file to get automatically saved to e.g. **/test/home/path_to_csv** – Thompson May 08 '12 at 16:35
  • 1
    Yes you can ..... all you need to do is remove the headers .... and change `"php://output"` to your path – Baba May 08 '12 at 16:36
  • Thanks for your method. I will try it on my PC (at present I'm on mobile), and let you know. – Thompson May 08 '12 at 16:39
  • Oops! Baba, I changed the **php://output** to **/mohan/path_to_csv**, and when I am loading the page it's giving several warnings! like **Warning: fopen(/mohan/path_to_csv) [function.fopen]: failed to open stream: No such file or directory in C:\wamp\www\mohan\TableToCsv.php on line 29**. Line 29 is **$fp = fopen("/mohan/path_to_csv", "w");** – Thompson May 09 '12 at 04:34
  • Hi, i am getting the CSV output as in this photo http://i.imgur.com/x2mWb.png, which inserts extra blank lines in between the data. Is Can it be fixed? I tried trim(). But it didn't work. – Thompson May 09 '12 at 06:16
  • `fputcsv` has additional parameters that you can experiment with and make sure you filter out empty array – Baba May 09 '12 at 09:57
  • In the lib code has `MAX_FILE_SIZE` which set to `600000` and if function `file_get_html` and `str_get_html` get the string size over than `600000` won't allow to progress and return false instead. I wish to work with a large table which contains string length up to 1 million and over, how should I do, can you suggest me that how important of the `MAX_FILE_SIZE` number. – Natta Wang Aug 16 '18 at 06:07
  • I tried this method, The csv is getting created but the table column content is not present in the csv. Instead, the content is written twice. Any idea why? Im using 1.8.1 version – Bej Apr 08 '19 at 14:02
17

You can use this function in separate js file:

function exportTableToCSV($table, filename) {

        var $rows = $table.find('tr:has(td)'),

            // Temporary delimiter characters unlikely to be typed by keyboard
            // This is to avoid accidentally splitting the actual contents
            tmpColDelim = String.fromCharCode(11), // vertical tab character
            tmpRowDelim = String.fromCharCode(0), // null character

            // actual delimiter characters for CSV format
            colDelim = '","',
            rowDelim = '"\r\n"',

            // Grab text from table into CSV formatted string
            csv = '"' + $rows.map(function (i, row) {
                var $row = $(row),
                    $cols = $row.find('td');

                return $cols.map(function (j, col) {
                    var $col = $(col),
                        text = $col.text();

                    return text.replace('"', '""'); // escape double quotes

                }).get().join(tmpColDelim);

            }).get().join(tmpRowDelim)
                .split(tmpRowDelim).join(rowDelim)
                .split(tmpColDelim).join(colDelim) + '"',

            // Data URI
            csvData = 'data:application/csv;charset=utf-8,' + encodeURIComponent(csv);

        $(this)
            .attr({
            'download': filename,
                'href': csvData,
                'target': '_blank'
        });
    }

Now, to initiate this function, you can use:

$('.getfile').click(
            function() { 
    exportTableToCSV.apply(this, [$('#thetable'), 'filename.csv']);
             });

where 'getfile' should be the class assigned to button, where you want to add call to action. (On clicking this button, the download popup will appear) and "thetable" should be the ID assigned to table you want to download.

You can also change to the custom file name to download in code.

avilporwal
  • 171
  • 1
  • 5
6

You can do this with arrays and regular expressions... See below

$csv = array();
preg_match('/<table(>| [^>]*>)(.*?)<\/table( |>)/is',$table,$b);
$table = $b[2];
preg_match_all('/<tr(>| [^>]*>)(.*?)<\/tr( |>)/is',$table,$b);
$rows = $b[2];
foreach ($rows as $row) {
    //cycle through each row
    if(preg_match('/<th(>| [^>]*>)(.*?)<\/th( |>)/is',$row)) {
        //match for table headers
        preg_match_all('/<th(>| [^>]*>)(.*?)<\/th( |>)/is',$row,$b);
        $csv[] = strip_tags(implode(',',$b[2]));
    } elseif(preg_match('/<td(>| [^>]*>)(.*?)<\/td( |>)/is',$row)) {
        //match for table cells
        preg_match_all('/<td(>| [^>]*>)(.*?)<\/td( |>)/is',$row,$b);
        $csv[] = strip_tags(implode(',',$b[2]));
    }
}
$csv = implode("\n", $csv);
var_dump($csv);

Then you can use file_put_contents() to write the csv string to file..

JamesBond
  • 312
  • 2
  • 17
Jacob Cruz
  • 435
  • 5
  • 9
5

To expand on the accepted answer I did this which allows me to ignore columns by class name and also deals with blank rows/columns.

You can use str_get_html http://simplehtmldom.sourceforge.net/. Just include it and away you go! :)

$html = str_get_html($html); // give this your HTML string

header('Content-type: application/ms-excel');
header('Content-Disposition: attachment; filename=sample.csv');

$fp = fopen("php://output", "w");

foreach($html->find('tr') as $element) {
  $td = array();
  foreach( $element->find('th') as $row) {
    if (strpos(trim($row->class), 'actions') === false && strpos(trim($row->class), 'checker') === false) {
      $td [] = $row->plaintext;
    }
  }
  if (!empty($td)) {
    fputcsv($fp, $td);
  }

  $td = array();
  foreach( $element->find('td') as $row) {
    if (strpos(trim($row->class), 'actions') === false && strpos(trim($row->class), 'checker') === false) {
      $td [] = $row->plaintext;
    }
  }
  if (!empty($td)) {
    fputcsv($fp, $td);
  }
}

fclose($fp);
exit;
lukeocodes
  • 1,192
  • 1
  • 16
  • 31
1

If anyone is using Baba's answer but scratching their head over extra white spaces being added, this will work:

include "simple_html_dom.php";
$table = '<table border="1">
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>';

$html = str_get_html($table);   

$fileName="export.csv";
header('Content-type: application/ms-excel');
header("Content-Disposition: attachment; filename=$fileName");

$fp = fopen("php://output", "w");
$csvString="";

$html = str_get_html(trim($table));
foreach($html->find('tr') as $element)
{

    $td = array();
    foreach( $element->find('th') as $row)
    {
        $row->plaintext="\"$row->plaintext\"";
        $td [] = $row->plaintext;
    }
    $td=array_filter($td);
    $csvString.=implode(",", $td);

    $td = array();
    foreach( $element->find('td') as $row)
    {
        $row->plaintext="\"$row->plaintext\"";
        $td [] = $row->plaintext;
    }
    $td=array_filter($td);
    $csvString.=implode(",", $td)."\n";
}
echo $csvString;
fclose($fp);
exit;

}

kmoney12
  • 4,413
  • 5
  • 37
  • 59
1

I've adapted a simple class based on the code found on this thread that now handles colspan and rowspan. Not heavily tested and I'm sure it could be optimized.

Usage:

require_once('table2csv.php');

$table = '<table border="1">
    <tr>
    <th colspan=2>Header 1</th>
    </tr>
    <tr>
    <td>row 1, cell 1</td>
    <td>row 1, cell 2</td>
    </tr>
    <tr>
    <td>row 2, cell 1</td>
    <td>row 2, cell 2</td>
    </tr>
    <tr>
    <td rowspan=2>top left row</td>
    <td>top right row</td>
    </tr>
    <tr>
    <td>bottom right</td>
    </tr>
    </table>';

table2csv($table,"sample.csv",true);

table2csv.php

<?php

    //download @ http://simplehtmldom.sourceforge.net/
    require_once('simple_html_dom.php');
    $repeatContentIntoSpannedCells = false;


    //--------------------------------------------------------------------------------------------------------------------

    function table2csv($rawHTML,$filename,$repeatContent) {

        //get rid of sups - they mess up the wmus
        for ($i=1; $i <= 20; $i++) { 
            $rawHTML = str_replace("<sup>".$i."</sup>", "", $rawHTML);
        }

        global $repeatContentIntoSpannedCells;

        $html = str_get_html(trim($rawHTML));
        $repeatContentIntoSpannedCells = $repeatContent;

        //we need to pre-initialize the array based on the size of the table (how many rows vs how many columns)

        //counting rows is easy
        $rowCount = count($html->find('tr'));

        //column counting is a bit trickier, we have to iterate through the rows and basically pull out the max found
        $colCount = 0;
        foreach ($html->find('tr') as $element) {

            $tempColCount = 0;

            foreach ($element->find('th') as $cell) {
                $tempColCount++;
            }

            if ($tempColCount == 0) {
                foreach ($element->find('td') as $cell) {
                    $tempColCount++;
                }
            }

            if ($tempColCount > $colCount) $colCount = $tempColCount;
        }

        $mdTable = array();

        for ($i=0; $i < $rowCount; $i++) { 
            array_push($mdTable, array_fill(0, $colCount, NULL));
        }

        //////////done predefining array

        $rowPos = 0;
        $fp = fopen($filename, "w");

        foreach ($html->find('tr') as $element) {

            $colPos = 0;

            foreach ($element->find('th') as $cell) {
                if (strpos(trim($cell->class), 'actions') === false && strpos(trim($cell->class), 'checker') === false) {
                    parseCell($cell,$mdTable,$rowPos,$colPos);
                }
                $colPos++;
            }

            foreach ($element->find('td') as $cell) {
                if (strpos(trim($cell->class), 'actions') === false && strpos(trim($cell->class), 'checker') === false) {
                    parseCell($cell,$mdTable,$rowPos,$colPos);
                }
                $colPos++;
            }   

            $rowPos++;
        }


        foreach ($mdTable as $key => $row) {

            //clean the data
            array_walk($row, "cleanCell");
            fputcsv($fp, $row);
        }
    }


    function cleanCell(&$contents,$key) {

        $contents = trim($contents);

        //get rid of pesky &nbsp's (aka: non-breaking spaces)
        $contents = trim($contents,chr(0xC2).chr(0xA0));
        $contents = str_replace("&nbsp;", "", $contents);
    }


    function parseCell(&$cell,&$mdTable,&$rowPos,&$colPos) {

        global $repeatContentIntoSpannedCells;

        //if data has already been set into the cell, skip it
        while (isset($mdTable[$rowPos][$colPos])) {
            $colPos++;
        }

        $mdTable[$rowPos][$colPos] = $cell->plaintext;

        if (isset($cell->rowspan)) {

            for ($i=1; $i <= ($cell->rowspan)-1; $i++) {
                $mdTable[$rowPos+$i][$colPos] = ($repeatContentIntoSpannedCells ? $cell->plaintext : "");
            }
        }

        if (isset($cell->colspan)) {

            for ($i=1; $i <= ($cell->colspan)-1; $i++) {

                $colPos++;
                $mdTable[$rowPos][$colPos] = ($repeatContentIntoSpannedCells ? $cell->plaintext : "");
            }
        }
    }

?>
capikaw
  • 12,232
  • 2
  • 43
  • 46
1

Baba's answer contains extra space. So, I updated the code to this:

include "simple_html_dom.php";
$table = '<table border="1">
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>';

$html = str_get_html($table);



header('Content-type: application/ms-excel');
header('Content-Disposition: attachment; filename=sample.csv');

$fp = fopen("php://output", "w");

foreach($html->find('tr') as $element)
{
    $td = array();
foreach( $element->find('th') as $row)
{
    $td [] = $row->plaintext;
}

foreach( $element->find('td') as $row)
{
    $td [] = $row->plaintext;
}
fputcsv($fp, $td);
}


fclose($fp);
Sudin Manandhar
  • 145
  • 1
  • 2
  • 11
0

Assuming that out_str has your html table data

$csv = $out_str;
        $csv = str_replace("<table class='gradienttable'>","",$csv);
        $csv = str_replace("</table>","",$csv);
        $csv = str_replace("</td><td>",",",$csv);
        $csv = str_replace("<td>","",$csv);
        $csv = str_replace("</td>","",$csv);
        $csv = str_replace("</font>","",$csv);
        $csv = str_replace("</tr>","",$csv);
        $csv = str_replace("<tr>","\n",$csv);

Remove any CSS from the file

        $csv = str_replace("<font color='yellow'>","",$csv);
        $csv = str_replace("<font color='red'>","",$csv);
        $csv = str_replace("<font color='green'>","",$csv);
        $csv = str_replace("</th><th>",",",$csv);
        $csv = str_replace("<th>","",$csv);
        $csv = str_replace("</th>","",$csv);

        file_put_contents('currentFile.csv',$csv);

Output the file currentFile.csv to the user

Hope it helps!

Ashwin Balani
  • 745
  • 7
  • 19