There are lots of search results on the web (and in SO) for something similar to what I need to do, but I haven't run into a solution yet for my particular situation.
I have a comma-delimited file in which only columns that have commas within them have double quotes around them. Other fields that have no comma in them are simply separated by the comma.
Take this example:
123,"box,toy",phone,"red,car,cat,dog","bike,pencil",man,africa,yellow,"jump,rope"
The output for that line needs to be:
123|box,toy|phone|red,car,cat,dog|bike,pencil|man|africa|yellow|jump,rope
I have this code currently:
Using sr As New StreamReader(csvFilePath)
Dim line As String = ""
Dim strReplacerQuoteCommaQuote As String = Chr(34) & "," & Chr(34)
Dim strReplacerQuoteComma As String = Chr(34) & ","
Dim strReplacerCommaQuote As String = "," & Chr(34)
Do While sr.Peek <> -1
line = sr.ReadLine
line = Replace(line, strReplacerQuoteCommaQuote, "|")
line = Replace(line, strReplacerQuoteComma, "|")
line = Replace(line, strReplacerCommaQuote, "|")
line = Replace(line, Chr(34), "")
Console.WriteLine("line: " & line)
Loop
End Using
The problem with that process is when I get to the fourth Replace() line the string looks like this:
123|box,toy|phone|red,car,cat,dog|bike,pencil|man,africa,yellow|jump,rope
So man and africa need pipes after them, but I can't just do a Replace on all commas, obviously.
How can I do this? Is there a RegEx statement that could handle this?
UPDATE With Working Code
The link in Avinash's comment had the answer that I went with. I Imported System.Text.RegularExpressions and used the following:
Using sr As New StreamReader(csvFilePath)
Dim line As String = ""
Dim strReplacerQuoteCommaQuote As String = Chr(34) & "," & Chr(34)
Dim strReplacerQuoteComma As String = Chr(34) & ","
Dim strReplacerCommaQuote As String = "," & Chr(34)
Do While sr.Peek <> -1
line = sr.ReadLine
Dim pattern As String = "(,)(?=(?:[^""]|""[^""]*"")*$)"
Dim replacement As String = "|"
Dim regEx As New Regex(pattern)
Dim newLine As String = regEx.Replace(line, replacement)
newLine = newLine.Replace(Chr(34), "")
Console.WriteLine("newLine: " & newLine)
Loop
End Using