2

I am using OpenXml to parse text from xlsx files.

Straightforward text and numbers extract as expected, but anything with cell formatting such as a date, currency or custom format are not extracting as per the displayed text in Excel. This wouldn't be a problem if I could easily identify the cell format in code and then take the relevant action, but I cannot find it anywhere!

Example.xlsx file has:


Cell A1 - Date Format *14/03/2001

Inputting text '14/11/2018' into cell A1 displays as '14/11/2018'

Local Variables:

c.CellValue = "43418"
c.CellValue.InnerText = "43418"
c.Prefix = "x"
c.StyleIndex = "2"
c.DataType Nothing
c.CellFormula Nothing

Required value = '14/11/2018'


Cell A2 - Currency Format (Symbol £, 2 decimal places)

Inputting text '2000' into cell A2 displays as '£2,000.00'

Local Variables:

c.CellValue = "2000"
c.CellValue.InnerText = "2000"
c.Prefix = "x"
c.StyleIndex = "3"
c.DataType Nothing
c.CellFormula Nothing

Required value = '£2,000.00'


Cell A3 - Custom Format "ABC-"@

Inputting text 'P-100' into cell A3 displays as 'ABC-P-100'

Local Variables:

OpenXml Cell.CellValue = "P-100"
OpenXml Cell.CellValue.InnerText = "P-100"
OpenXml Cell.Prefix = "x"
OpenXml Cell.StyleIndex = "1"
OpenXml Cell.DataType = "s"
c.CellFormula Nothing

Required value = 'ABC-P-1000'


Here is my code:

Public Shared Sub parseXLS(strFileName As String, sbTxtFromFile As StringBuilder)
    Call fncParseXLSXorXLSM(strFileName, sbTxtFromFile)
End Sub

Public Shared Function fncParseXLSXorXLSM(strFileName As String, sbTxtFromFile As StringBuilder) As StringBuilder

        sbTxtFromFile.Length = 0
        Dim intFirst As Integer = 1

        Try
            Using spreadsheetDocument__1 As SpreadsheetDocument = SpreadsheetDocument.Open(strFileName, False)
                Dim workbookPart As WorkbookPart = spreadsheetDocument__1.WorkbookPart
                Dim sharedStringItemsArray As SharedStringItem() = workbookPart.SharedStringTablePart.SharedStringTable.Elements(Of SharedStringItem)().ToArray()
                Dim sheets As DocumentFormat.OpenXml.Spreadsheet.Sheets = spreadsheetDocument__1.WorkbookPart.Workbook.Sheets

                ' For each sheet, display the sheet information.
                For Each sheet As DocumentFormat.OpenXml.OpenXmlElement In sheets
                    For Each attr As DocumentFormat.OpenXml.OpenXmlAttribute In sheet.GetAttributes()
                        Debug.Print("{0}: {1}", attr.LocalName, attr.Value)
                        If attr.LocalName = "name" Then
                            sbTxtFromFile.Append(attr.Value)
                        End If
                    Next

                Next

                For Each worksheetPart As WorksheetPart In workbookPart.WorksheetParts
                    Dim reader As OpenXmlReader = OpenXmlReader.Create(worksheetPart)
                    While reader.Read()
                        If reader.ElementType Is GetType(Cell) Then
                            Do
                                Dim c As Cell = DirectCast(reader.LoadCurrentElement(), Cell)

                                If c.DataType IsNot Nothing AndAlso c.DataType.Value.ToString = "SharedString" Then
                                    Dim ssi As SharedStringItem = sharedStringItemsArray(Integer.Parse(c.CellValue.InnerText))

                                    If Not ssi.Text Is Nothing Then
                                        If Not ssi.Text.Text Is Nothing Then
                                            If intFirst = 1 Then
                                                sbTxtFromFile.Append(ssi.Text.Text)
                                                intFirst = 2
                                            Else
                                                sbTxtFromFile.Append(Environment.NewLine & ssi.Text.Text)
                                            End If
                                        End If
                                    Else
                                        If Not ssi.InnerText Is Nothing Then
                                            If intFirst = 1 Then
                                                sbTxtFromFile.Append(ssi.InnerText)
                                                intFirst = 2
                                            Else
                                                sbTxtFromFile.Append(Environment.NewLine & ssi.InnerText)
                                            End If
                                        End If
                                    End If
                                Else
                                    If Not c.CellValue Is Nothing Then
                                        If intFirst = 1 Then
                                            sbTxtFromFile.Append(c.CellValue.InnerText)
                                            intFirst = 2
                                        Else
                                            sbTxtFromFile.Append(Environment.NewLine & c.CellValue.InnerText)
                                        End If
                                    End If
                                End If
                            Loop While reader.ReadNextSibling()
                        End If
                        If sbTxtFromFile.Length > 0 Then
                            sbTxtFromFile.Append(Environment.NewLine)
                        End If
                    End While
                Next

            End Using

        Catch ex As Exception
            If ex.Message Like "The process cannot access the file '*" Then 'File in use
                sbTxtFromFile.Append("|11readonly11|")
            ElseIf ex.Message Like "Could not find*" Then 'File in use
                sbTxtFromFile.Append("|11notfound11|")
            Else
                sbTxtFromFile.Append("|11cannotread11|")
            End If
        End Try

        Return sbTxtFromFile

    End Function

Any ideas?

GoodJuJu
  • 1,296
  • 2
  • 16
  • 37
  • Possible duplicate of [How to get cell value with applied formatting (formatted cell value) with OpenXML SDK](https://stackoverflow.com/q/8929799/11683) – GSerg Nov 19 '18 at 15:22

1 Answers1

0

You can read cell style based on StyleIndex property, which will help you to determine formatting applied on the cell.

Use following method as an example and work around based on your need:

Private Shared Sub ReadCellFormat(cell As Cell, stylesheet As Stylesheet)
    If cell.StyleIndex.HasValue Then
        Debug.Print("Style found for the cell:")
        If stylesheet.CellFormats.Count.Value > cell.StyleIndex.Value Then
            Dim format As CellFormat = stylesheet.CellFormats.ElementAt(cell.StyleIndex.Value)
            If format.NumberFormatId.HasValue AndAlso format.NumberFormatId.Value > 0 Then
                Dim numberFormat As NumberingFormat = stylesheet.NumberingFormats.Single(Function(x As NumberingFormat) x.NumberFormatId.Value = format.NumberFormatId.Value)
                Debug.Print($"Format code: {numberFormat.FormatCode}")
            End If
        End If
    End If
End Sub

P.S: This method just reads the number format for the given cell.

Dipen Shah
  • 25,562
  • 1
  • 32
  • 58
  • Thanks for your reply Dipen. As you can see in my example above, I already know the StyleIndex value, but it's just a meaningless number. What I need are the actual values and prefixes, not just the number format. – GoodJuJu Nov 19 '18 at 21:15
  • @GoodJuJu What do you mean by `actual values`? Do you mean calculated values, after applying format/prefix? – Dipen Shah Nov 19 '18 at 21:32
  • ideally I want the calculated value, after applying the prefix, as it appears in excel. If I cannot get that, I want to return something that tells me what the cell format is, e.g. Date, Currency, Custom etc. – GoodJuJu Nov 19 '18 at 21:43
  • @GoodJuJu I don't think it is possible to get calculated values. However, you can determine cell format based on `NumberFormatId` property. – Dipen Shah Nov 19 '18 at 21:47