0

I need to convert a HTML table with styling to an .XLSX file.

I managed to do that using the free version of GemBox.Spreadsheet, for now I don't mind the limitation of the free version but the problem is that numbers are considered as text when I open the Excel file. Is there any solution to that without manually opening the Excel file and converting them myself ? Or even a free alternative to GemBox library ?

File.WriteAllText("Table.html", html);
ExcelFile.Load("Table.html", LoadOptions.HtmlDefault).Save("Test.xlsx");

My HTML looks like this

<html>
  <body>
    <center>
      <table border="1" cellpadding="0" cellspacing="0">
        <tr>
          <td>Crimson Witch</td>
          <td>HP</td>
          <td>ATK</td>
          <td>DEF</td>
        </tr>
        <tr>
          <td>Flower</td>
          <td>10</td>
          <td style="background-color: #808080"></td>
          <td style="background-color: #808080"></td>
        </tr>
        <tr>
          <td>Plume</td>
          <td style="background-color: #808080"></td>
          <td>10</td>
          <td style="background-color: #808080"></td>
        </tr>
      </table>
    </center>
  </body>
</html>

Solution :

File.WriteAllText("Table.html", html.Replace("<center>", string.Empty).Replace("</center>", string.Empty));
ExcelFile.Load("Table.html", LoadOptions.HtmlDefault).Save("Test.xlsx");
Yox
  • 313
  • 2
  • 6
  • 18
  • If you're going to mention a tool you're working with, it's a good idea to make it a link so people can try and figure out what you're referring to. Anyway, I have no idea what GemBox.Spreadsheet does but I'd start by looking at the raw CSV file (you should post it here). If that tool is not inserting the right data, I'd use a lightweight HTML parser (such as my own [SoftCircuits.HtmlMonkey](https://github.com/SoftCircuits/HtmlMonkey)) and extract the table columns that way. – Jonathan Wood Oct 28 '21 at 16:17
  • @JonathanWood I don't have any CSV file. Do you mean the XSLX file ? – Yox Oct 28 '21 at 16:57
  • 1
    There are several public domain Excel spreadsheet libraries, like the ubiquitous NPOI for example, which allow you cell-level access to do whatever you wish. – Blindy Oct 28 '21 at 16:57
  • 1
    @Yox: Sorry, you're creating an XSLX file directly. If I was to do it, I'd parse the HTML with my HTML parser, and then either write it to a CSV file, or use an Excel builder library (I have an open source one of those too--as well as a CSV writer) to create the Excel file. That's the only way you can have fine-tuned control over the process. – Jonathan Wood Oct 28 '21 at 17:02

2 Answers2

1

The problem occurs because the <table> is inside the <center>.

Try removing the <center> element.

EDIT:

This issue is now resolved in the current latest bugfix version: https://www.gemboxsoftware.com/spreadsheet/nightlybuilds/GBS47v1336.zip

And in the current latest NuGet package:
Install-Package GemBox.Spreadsheet -Version 47.0.1336-hotfix

Mario Z
  • 4,328
  • 2
  • 24
  • 38
0

I agree with @Jonathan Wood.

It would be best to do this in two steps:

  1. Parse the HTML table with the HTML Agility Pack NuGet package.

    var query = from table in
                doc.DocumentNode.SelectNodes("//table").Cast<HtmlNode>()
                from row in table.SelectNodes("tr").Cast<HtmlNode>()
                from cell in row.SelectNodes("th|td").Cast<HtmlNode>()
                select new {Table = table.Id, CellText = cell.InnerText};
    
  2. Use EPPlus and its LoadFromArrays method.

string firstName = "Bob";
string lastName = "Burton";
DateTime dateOfBirth = new DateTime(2000, 1, 1);
var testData = new List<object[]>()
                {
                    new object[] {"First name", firstName},
                    new object[] {"Last name", lastName},
                    new object[] {"Date of birth", dateOfBirth}
                };

using (var excelPackage = new ExcelPackage())
{
    ExcelWorksheet excelWorksheet = excelPackage.Workbook.Worksheets.Add("Test worksheet");
    //Load the summary data into the sheet, starting from cell A1. Print the column names on row 1
    excelWorksheet.Cells["A1"].LoadFromArrays(testData);
}
Jim G.
  • 15,141
  • 22
  • 103
  • 166