4

I'm looking for a regex that will parse a line at a time from a csv file. basically, what string.readline() does, but it will allow line breaks if they are within double quotes.

or is there an easier way to do this?

mike
  • 103
  • 2
  • 8
  • 1
    I don't understand why people are obsessed with regular expressions on Stack Overflow. I understand their utility, but I don't see why you can't just use a CSV parser. – avpx Apr 09 '10 at 22:41
  • There are so many 3rd party CSV parsing libraries out and none of them uses regex. Just because that isn't the *right tool* for it. – BalusC Apr 09 '10 at 22:46
  • I understand completely, because it offers the lure of an easy fix. If you don't know regex well, it sometimes seems like any text processing problem can be solved in a single line of regex. Whereas finding, hooking up, and testing a parser can seem intimidating by comparison. – tloflin Apr 09 '10 at 22:47
  • Possible duplicate of [Regular Expression (C#) For CSV by RFC 4180](http://stackoverflow.com/questions/34132392/regular-expression-c-for-csv-by-rfc-4180) – David Woodward Oct 10 '16 at 14:24

3 Answers3

5

Using regex to parse CSV is fine for simple applications in well-controlled CSV data, but there are often so many gotchas, such as escaping for embedded quotes and commas in quoted strings, etc. This often makes regex tricky and risky for this task.

I recommend a well-tested CSV module for your purpose.

--Edit:-- See this excellent article, Stop Rolling Your Own CSV Parser!

Mark Rejhon
  • 869
  • 7
  • 14
1

The FileHelpers library is pretty good for this purpose.

http://www.filehelpers.net/

Marcos Meli
  • 3,468
  • 24
  • 29
Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
0

Rather than relying on error prone regular expressions, over simpified "split" logic or 3rd party components, use the .NET framework's built in functionality:

Using Reader As New Microsoft.VisualBasic.FileIO.TextFieldParser("C:\MyFile.csv")

    Reader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited

    Dim MyDelimeters(0 To 0) As String
    Reader.HasFieldsEnclosedInQuotes = False
    Reader.SetDelimiters(","c)

    Dim currentRow As String()
    While Not Reader.EndOfData
        Try
            currentRow = Reader.ReadFields()
            Dim currentField As String
            For Each currentField In currentRow
                MsgBox(currentField)
            Next
        Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
            MsgBox("Line " & ex.Message &
            "is not valid and will be skipped.")
        End Try
    End While
End Using
Chad
  • 23,658
  • 51
  • 191
  • 321