7

I'm using Epplus to render an Excel spreadsheet into HTML. So far it's going very, very well, except for one thing... spanning merged cells. I just can't seem to get the logic right. I thought I would throw it out there to see how the community would deal with it. Here is my code so far.

public String ParseExcelStamps(String FileName)
{
    FileInfo theFile = new FileInfo(FileName);
    String html = "";
    using (ExcelPackage xlPackage = new ExcelPackage(theFile))
    {
        var workbook = xlPackage.Workbook;
        if (workbook != null)
        {
            for (int j = 1; j <= workbook.Worksheets.Count; j++)
            {
                Tab tab = new Tab();
                html+= "<table style='border-collapse: collapse;font-family:arial;'><tbody>";
                var worksheet = workbook.Worksheets[j];
                tab.Title = worksheet.Name;
                if (worksheet.Dimension == null) { continue; }
                int rowCount = 0;
                int maxColumnNumber = worksheet.Dimension.End.Column;
                var convertedRecords = new List<List<string>>(worksheet.Dimension.End.Row);
                var excelRows = worksheet.Cells.GroupBy(c => c.Start.Row).ToList();
                excelRows.ForEach(r =>
                {
                    rowCount++;
                    html += String.Format("<tr>");
                    var currentRecord = new List<string>(maxColumnNumber);
                    var cells = r.OrderBy(cell => cell.Start.Column).ToList();
                    Double rowHeight = worksheet.Row(rowCount).Height;
                    for (int i = 1; i <= maxColumnNumber; i++)
                    {
                        var currentCell = cells.Where(c => c.Start.Column == i).FirstOrDefault();

                        //look aheads for colspan and rowspan
                        ExcelRangeBase previousCellAbove = null;
                        ExcelRangeBase previousCell = null;
                        ExcelRangeBase nextCell = null;
                        ExcelRangeBase nextCellBelow = null;
                        try { previousCellAbove = worksheet.Cells[rowCount-1, i]; }catch (Exception) { }
                        try { previousCell = worksheet.Cells[rowCount, (i - 1)]; }catch (Exception) { }
                        try { nextCell = worksheet.Cells[rowCount, (i + 1)]; }catch (Exception) { }
                        try { nextCellBelow = worksheet.Cells[rowCount+1, i]; }catch (Exception) { }

                        if ((previousCell != null) && (previousCell.Merge) && (currentCell != null) && (currentCell.Merge)){continue;}
                        if ((previousCellAbove != null) && (previousCellAbove.Merge) && (currentCell != null)) {continue; }

                        if (currentCell == null)
                        {
                            html += String.Format("<td>{0}</td>", String.Empty);
                        }
                        else
                        {
                            int colSpan = 1;
                            int rowSpan = 1;
                            if ((nextCell != null) && (nextCell.Merge) && (currentCell.Merge)) {
                                colSpan = 2;
                               // Console.WriteLine(String.Format("{0} - {1}", currentCell.Address, nextCell.Address));
                            }

                            if ((nextCellBelow != null) && (nextCellBelow.Merge) && (currentCell.Merge)) {
                                Console.WriteLine(String.Format("{0} - {1}", currentCell.Address, nextCellBelow.Address));
                            }

                            html += String.Format("<td colspan={0} rowspan={1}>{2}</td>", colSpan, rowSpan, currentCell.Value);
                        }
                    }
                    html += String.Format("</tr>");
                });
                html += "</tbody></table>";
            }//worksheet loop
        }
    }
    return html;
}
Agustin Meriles
  • 4,866
  • 3
  • 29
  • 44
BigBadOwl
  • 669
  • 2
  • 9
  • 22
  • The first question I would ask is whether it's possible to edit the worksheet to unmerge cells. Worth a try? – andy holaday Jul 11 '12 at 00:24
  • I would really like to solve this without having to edit the source, plus Im not in control of the source documents. – BigBadOwl Jul 11 '12 at 20:46
  • The trick with this is that Epplus's cell.merge only returns a Boolean. When you check a cell to see if it's merged, you cannot tell if it's merged with the cell before, after, below, or above and are therefore unable to determine if it's a colspan or a rowspan. – BigBadOwl Jul 11 '12 at 20:48

1 Answers1

13

As far as I can tell this is exactly what you need. What you were missing was the MergedCells property on the worksheet which lists all merged cells in the sheet.

My code handles row spans, column spans, and both at the same time. I did some testing with a spreadsheet that included both row, column and row/column spanning. In all cases they worked perfectly.

Code

int colSpan = 1;
int rowSpan = 1;

//check if this is the start of a merged cell
ExcelAddress cellAddress = new ExcelAddress(currentCell.Address);

var mCellsResult = (from c in worksheet.MergedCells 
                let addr = new ExcelAddress(c)
                    where cellAddress.Start.Row >= addr.Start.Row &&
                    cellAddress.End.Row <= addr.End.Row &&
                    cellAddress.Start.Column >= addr.Start.Column &&
                    cellAddress.End.Column <= addr.End.Column 
                select addr);

if (mCellsResult.Count() >0)
{
    var mCells = mCellsResult.First();

    //if the cell and the merged cell do not share a common start address then skip this cell as it's already been covered by a previous item
    if (mCells.Start.Address != cellAddress.Start.Address)
        continue;

    if(mCells.Start.Column != mCells.End.Column) {
        colSpan += mCells.End.Column - mCells.Start.Column;
    }

    if (mCells.Start.Row != mCells.End.Row)
    {
        rowSpan += mCells.End.Row - mCells.Start.Row;
    }
}

//load up data
html += String.Format("<td colspan={0} rowspan={1}>{2}</td>", colSpan, rowSpan, currentCell.Value);
Peter
  • 9,643
  • 6
  • 61
  • 108
  • This is brilliant, works like a charm. It's so cool to finally be able to show an Excel document as HTML. I will put the full code up on GitHub soon and post the link here. Thanks Peter, your solution is worth 50 points and more. – BigBadOwl Jul 13 '12 at 06:46
  • 1
    Make sure to put a HtmlEncode around your currentCell.Value. I'm fairly sure that as it stands a < or > sign would cause some issues with the rendering. – Peter Jul 13 '12 at 13:08
  • 2
    +1, a minor improvement: always use `Enumerable.Any()` instead of `Enumerable.Count()`. The former just checks if any element exists, the latter iterates all elements. Better: `if (mCellsResult.Any())` – Tim Schmelter Jul 13 '12 at 20:45
  • @TimSchmelter - I would disagree with always using Any, depending on the underlying type of the collection, however in this case I would agree. ( http://stackoverflow.com/questions/305092/which-method-performs-better-any-vs-count-0 ) – Peter Jul 13 '12 at 20:49
  • @Peter: In case where `Count` is more efficient the difference is negligible but `Count()` could execute a query which yields millions of elements whereas `Any()` always only one, plus the naming is more meaningful. – Tim Schmelter Jul 13 '12 at 20:57