0

I have a file "import.php" in which html data is written under table tags. Now i want to parse that data and save that data in an excel sheet. Format is undermentioned and first tr contains the heading and then the data

<html>
  <body>
  <table>
  <tr>
  <th>Name</th>
  <th>Email</th>
  <th>Addr</th>
  <th>City</th>
  </tr>

  <tr>
  <td>Jack</td>
  <td>a@b.com</td>
  <td>xyz Road</td>
  <td>LOS ANGELES</td>
  </tr>

  <tr>
  <td>Sam</td>
  <td>sam@b.com</td>
  <td>pr Road</td>
  <td>TUSTIN</td>
  </tr>
    </table>
  </body>
  </html>
swapnesh
  • 26,318
  • 22
  • 94
  • 126
  • Where are you having problems? With parsing the data? or with saving it as Excel? – Mark Baker Apr 17 '12 at 06:43
  • saving the data in an excel sheet :( – swapnesh Apr 17 '12 at 06:45
  • 1
    Have you already parsed the HTML? what's the product of parsing? do you have an array? JSON object? Do you know how excel is structured in general? these two links will help you: http://bit.ly/2Ltpr , http://bit.ly/a8lS74 – Adi Apr 17 '12 at 07:19
  • Actually i ned to know how to parse that html data to create and excel doc – swapnesh Apr 17 '12 at 07:39
  • @Adnan - A Tab-Separated Value file isn't actually an Excel file, it's just a format that Excel is capable of reading in addition to Excel files – Mark Baker Apr 17 '12 at 08:00
  • @swapnesh - To save as Excel, have a look at some of the Excel writer libraries listed http://stackoverflow.com/questions/3930975/alternative-for-php-excel such as PHPExcel – Mark Baker Apr 17 '12 at 08:01
  • 1
    @swapnesh - For parsing the HTML, look at DOM... I do have a basic HTML to PHPExcel parser, but it isn't yet production ready – Mark Baker Apr 17 '12 at 08:04
  • @MarkBaker yeah i too found something DOM relating to parse that HTML..but i think some well formatted examples are not given..plz let me know if u have any link – swapnesh Apr 17 '12 at 08:18
  • @swapnesh - I don't have any useful links, just my current code (which is designed as a PHPExcel reader class), so may be too specific unless you plan on using PHPExcel itself for the writing – Mark Baker Apr 17 '12 at 08:24
  • @MarkBaker yesterday I downloaded and follow some sample examples on PHPExcel but its too heavy have as i have only around 500-600 to parse ..thats why im looking for a short script as i already converted xlsx data format to html :( – swapnesh Apr 17 '12 at 08:32
  • You'll be lucky to find a "short script" that can do this... there's various lightweight Excel writers available in the link I gave above, but I think if you want to parse the HTML you'll need to roll your own parser – Mark Baker Apr 17 '12 at 08:58

1 Answers1

1

Maybe you've better look at this: http://phpexcel.codeplex.com/

and this: http://www.easyxls.com/

Another trick is to save your data as a CSV file: http://www.homeandlearn.co.uk/php/php10p6.html

UPDATE:

There is no simpler way for saving data into an Excel file directly but saving as CSV. Try this code:

<?php 
$data = array( array("firstname" => "Mary", "lastname" => "Johnson", "age" => 25), 
    array("firstname" => "Amanda", "lastname" => "Miller", "age" => 18), 
    array("firstname" => "James", "lastname" => "Brown", "age" => 31), 
    array("firstname" => "Patricia", "lastname" => "Williams", "age" => 7), 
    array("firstname" => "Michael", "lastname" => "Davis", "age" => 43), 
    array("firstname" => "Sarah", "lastname" => "Miller", "age" => 24), 
    array("firstname" => "Patrick", "lastname" => "Miller", "age" => 27) ); 

    # filename for download 
    $filename = "website_data_" . date('Ymd') . ".xls"; 
    header("Content-Disposition: attachment; filename=\"$filename\""); 
    header("Content-Type: application/vnd.ms-excel");
    $flag = false; 
    foreach($data as $row) 
    { 
        if(!$flag) 
        { # display field/column names as first row echo 
            implode("\t", array_keys($row)) . "\r\n"; $flag = true; 
        } 
        array_walk($row, 'cleanData');
        print implode("\t", array_values($row)) . "\r\n"; 
    }


    function cleanData(&$str) 
    { 
        $str = preg_replace("/\t/", "\\t", $str); 
        $str = preg_replace("/\r?\n/", "\\n", $str); 
        if(strstr($str, '"')) 
            $str = '"' . str_replace('"', '""', $str) . '"';
    }
?>

As for the seperation of the values from the html tags, you could read your html code from import.php, remove all unnecessary tags, put a seperator to the end of each or tag and then put your values into a single dimension array:

<?php 
$htmldata = "";
$htmldata .= "<html>";
$htmldata .= "<body>";
$htmldata .= "<tr>";
$htmldata .= "<th>header1</th>";
$htmldata .= "<th>header2</th>";
$htmldata .= "<th>header3</th>";
$htmldata .= "<th>header4</th>";
$htmldata .= "</tr>";
$htmldata .= "<tr>";
$htmldata .= "<td>data1</td>";
$htmldata .= "<td>data2</td>";
$htmldata .= "<td>data3</td>";
$htmldata .= "<td>data4</td>";
$htmldata .= "</tr>";
$htmldata .= "</body>";
$htmldata .= "</html>";

//Remove the unecessary tags like <html>, </html>, <body>, </body>, <th>, </th>, <td>, </td>
$searchfor = array("<html>", "</html>", "<body>", "</body>", "<tr>", "</tr>", "<th>", "</th>", "<td>", "</td>");
$replacewith = array("", "", "", "", "", "", "", "**SEPERATOR**", "", "**SEPERATOR**"); // Replace </th> & </td> with **SEPERATOR** text
$htmldata = str_replace($searchfor, $replacewith, $htmldata);

$values = explode("**SEPERATOR**", $htmldata); ;
print_r($values);

?>

The first 4 values of the array $values contain your header values. Hope that it helps...

dimmat
  • 195
  • 1
  • 3
  • 10