4

I am using DotNetCore.NPOI (1.2.1) in order to read an MS Excel file.

Some of the cells are of type text and contain formatted strings (e.g. some words in bold).

How do I get the formatted cell value? My final goal: Retrieve the cell text as HTML.

I tried

var cell = row.GetCell(1);
var richStringCellValue = cell.RichStringCellValue;

But this won't let me access the formatted string (just the plain string without formattings).

Does anybody have an idea or solution?

Ingmar
  • 1,525
  • 6
  • 34
  • 51

2 Answers2

3

I think you'll have to take longer route in this case. First you'll have to maintain the formatting of cell value like date, currency etc and then extract the style from cell value and embed the cell value under that style.

best option is to write extenstion method to get format and style value.

To get the fomat Please see this link How to get the value of cell containing a date and keep the original formatting using NPOI

For styling first you'll have to check and find the exact style of running text and then return the value inside the html tag , below method will give you idea to extract styling from cell value. Code is untested , you may have to include missing library.

 public void GetStyleOfCellValue()
        {

            XSSFWorkbook wb = new XSSFWorkbook("YourFile.xlsx");
            ISheet sheet = wb.GetSheetAt(0);
            ICell cell = sheet.GetRow(0).GetCell(0);  
            XSSFRichTextString richText = (XSSFRichTextString)cell.RichStringCellValue;
            int formattingRuns = cell.RichStringCellValue.NumFormattingRuns;

            for (int i = 0; i < formattingRuns; i++)
            {
                int startIdx = richText.GetIndexOfFormattingRun(i);
                int length = richText.GetLengthOfFormattingRun(i);
                Console.WriteLine("Text: " + richText.String.Substring(startIdx, startIdx + length));
                if (i == 0)
                {
                    short fontIndex = cell.CellStyle.FontIndex;
                    IFont font = wb.GetFontAt(fontIndex);
                    Console.WriteLine("Bold: " + (font.IsBold)); // return string <b>my string</b>.
                    Console.WriteLine("Italics: " + font.IsItalic + "\n"); // return string <i>my string</i>. 
                    Console.WriteLine("UnderLine: " + font.Underline + "\n"); // return string <u>my string</u>. 
                }
                else
                {
                    IFont fontFormat = richText.GetFontOfFormattingRun(i);
                    Console.WriteLine("Bold: " + (fontFormat.IsBold)); // return string <b>my string</b>.
                    Console.WriteLine("Italics: " + fontFormat.IsItalic + "\n");// return string <i>my string</i>. 
                }
            }
        }
kumar chandraketu
  • 2,232
  • 2
  • 20
  • 25
  • Ohhhhhhhhh. THIS is how this is supposed to work. The whole string is divided into "FormattingRuns" and I can then check the basic formattings of a single formattingRun. Horrible. ;) But awesome, that you made this clear to me. Thanks, Kumar. This helps me a lot! – Ingmar Oct 19 '18 at 12:00
1

Font formatting in XLSX files are stored according to schema http://schemas.openxmlformats.org/spreadsheetml/2006/main which has no direct relationship to HTML tags. Therefore your task is not that much straight forward.

style = cell.getCellStyle();
font = style.getFont(); // or style.getFont(workBook);
// use Font object to query font attributes. E.g. font.IsItalic

Then you will have to build the HTML by appending relevant HTML tags.

TRiNE
  • 5,020
  • 1
  • 29
  • 42
  • Hello TRINE, thank you for your answer. However, I am not sure how this is going to help me. The link you provided does not seem to exist (any more). And the code snippet (I tried: var style = row.GetCell(1).CellStyle; var font = style.GetFont(workbook);) doesn't help me at all. By exploring the font object I couldn't find any properties that would help me analyzing the different formattings inside cell's RichStringCellValue. Am I missing the obvious? – Ingmar Oct 18 '18 at 17:48
  • @Ingmar It is not a link. See what are schema namespaces https://www.oracle.com/technetwork/articles/srivastava-namespaces-092580.html – TRiNE Oct 19 '18 at 03:54
  • @Ingmar Btw, this library has limited implementation. Moving from this library will work if it is possible. Can use excel instance; https://forums.asp.net/t/2097015.aspx?Import+Excel+with+Cell+Styles+In+Asp+Net+C+ – TRiNE Oct 19 '18 at 04:06
  • 1
    Thanks again for your help, TRiNE. Kumar (above) gave me the idea/information that I needed. And yes, I already figured, that NPOI is a bit limited and somehow uncomfortable to use. However, it seems to be the only solution that is available for free for ASP.NET CORE so far. – Ingmar Oct 19 '18 at 12:02