0

I am using some great code taken from StackOverflow and converted to VB NET to:

  • Extract all rows of text from all speadsheets in an excel .xlsx file
  • Append each row to a StringBuilder.

The code runs really well, but my stringbuilder entries are split every 1026 characters. i.e. Text reads ..This is Sta
ckOverflow.. instead of
'..This is StackOverflow..' and splitting at a natural row break.

Imports System.Text
Imports System.Linq
Imports DocumentFormat.OpenXml
Imports DocumentFormat.OpenXml.Packaging
Imports DocumentFormat.OpenXml.Spreadsheet

Public Class clsParseXLS

    Public Shared Sub parseXLSX(strFileName As String, sbTxtFromFile As StringBuilder)

        Dim intFirst As Integer = 1

        Try
            Using spreadsheetDocument__1 As SpreadsheetDocument = SpreadsheetDocument.Open(strFileName, False)
                Dim workbookPart As WorkbookPart = spreadsheetDocument__1.WorkbookPart

                For Each worksheetPart As WorksheetPart In workbookPart.WorksheetParts
                    Dim reader As OpenXmlReader = OpenXmlReader.Create(worksheetPart)

                    While reader.Read()
                        If reader.ElementType Is GetType(Row) Then
                            reader.ReadFirstChild()

                            Do
                                If reader.ElementType Is GetType(Cell) Then

                                    Dim c As Cell = DirectCast(reader.LoadCurrentElement(), Cell)

                                    If c.DataType IsNot Nothing AndAlso c.DataType.Value.ToString = "SharedString" Then

                                        Dim ssi As SharedStringItem = workbookPart.SharedStringTablePart.SharedStringTable.Elements(Of SharedStringItem)().ElementAt(Integer.Parse(c.CellValue.InnerText))

                                        If Not ssi Is Nothing Then
                                            If intFirst = 1 Then
                                                sbTxtFromFile.Append(ssi.Text.Text)
                                                intFirst = 2
                                            Else
                                                sbTxtFromFile.Append(Environment.NewLine & reader.GetText())
                                                intFirst = 2
                                            End If
                                        End If
                                    Else
                                        If Not c.CellValue Is Nothing Then
                                            If intFirst = 1 Then
                                                sbTxtFromFile.Append(c.CellValue.InnerText)
                                                intFirst = 2
                                            Else
                                                sbTxtFromFile.Append(Environment.NewLine & c.CellValue.InnerText)
                                                intFirst = 2
                                            End If
                                        End If
                                    End If

                                    'Console.Out.Write("{0}: {1} ", c.CellReference, cellValue)
                                End If
                            Loop While reader.ReadNextSibling()
                            'Console.Out.WriteLine()
                        End If
                    End While
                Next
            End Using

        Catch ex As Exception
            MsgBox(ex.ToString)
        End Try
    End Sub

End Class

Does anyone know what I can do to remedy the situation and split on the rows?

Any help anyone can offer is most gratefully appreciated.

Many thanks to @Hans for posting the original code I am using. Using OpenXmlReader

Community
  • 1
  • 1
GoodJuJu
  • 1,296
  • 2
  • 16
  • 37

1 Answers1

0

Must be when the StringBuilder resets its buffer, do you have to use String Builder?

Jack Miller
  • 325
  • 1
  • 16
  • Would have commented, but not enough rep – Jack Miller Sep 22 '16 at 16:34
  • Hi Jack, thanks for your reply. The StringBuilder doesn't have a buffer, it is the way SAX is reading in chunks of 1026 chars. I have tried using a StringBuilder or just a plain old string and the results are the same. – GoodJuJu Sep 23 '16 at 07:26