13

I am using .NET's String.Split method to break up a string using commas, but I want to ignore strings enclosed in double quotes for the string. I have read that a

For example, the string below.

Fruit,10,"Bananas, Oranges, Grapes"

I would like to get the following

Fruit
10
"Bananas, Oranges, Grapes"

Currently I am getting the following output

Fruit
10
"Bananas
 Oranges
 Grapes"
enter code here

After following suggestions and the answers provided, here is a sample of what I ended up with. (It worked for me obviously)

Imports Microsoft.VisualBasic.FileIO

Dim fileReader As New TextFieldParser(fileName)

fileReader.TextFieldType = FieldType.Delimited
fileReader.SetDelimiters(",")
fileReader.HasFieldsEnclosedInQuotes = True

While fileReader.EndOfData = False


Dim columnData() As String = fileReader.ReadFields

' Processing of field data

End While
Jerry
  • 70,495
  • 13
  • 100
  • 144
Tachi
  • 1,416
  • 4
  • 12
  • 15
  • 4
    Use an available csv-parser like [`TextFieldParser`](http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.hasfieldsenclosedinquotes(v=vs.110).aspx) or [this](http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader) which support quoting characters. – Tim Schmelter Jan 21 '14 at 14:46
  • [`FileHelpers`](http://filehelpers.sourceforge.net/) is a good CSV library. – Tim S. Jan 21 '14 at 14:49
  • Might be helpful: http://stackoverflow.com/questions/18144431/regex-to-split-a-csv – Duncan Finney Jan 21 '14 at 14:53
  • @Time Schmelter Followed your suggestion and it worked thanks. – Tachi Jan 21 '14 at 16:22

4 Answers4

13

You are better off with a parser, like those mentioned in the comments. That said, it's possible to do it with regex in the following way:

,(?=(?:[^"]*"[^"]*")*[^"]*$)

The positive lookahead ((?= ... )) ensures that there is an even number of quotes ahead of the comma to split on (i.e. either they occur in pairs, or there are none).

[^"]* matches non-quote characters.

Jerry
  • 70,495
  • 13
  • 100
  • 144
  • 1
    This is very slow! Reading in a 300k line CSV goes from ~2 seconds to ~111 seconds on my computer (4s to 40s in parallel). I played around with alternatives, and found the old school VisualBasic TextFieldParser is actually faster (33 seconds, not parallel) and automatically handles the "commas in quotes" problem. Of course TextFieldParser is slower than say StreamReader. It seems like in the absence of a third party CSV library, the TextFieldParser class is the way to go for dirty CSV files if performance is important. – jspinella May 22 '20 at 10:23
  • @jspinella Of course a dedicated parser would be faster, because it has been optimised for doing so. The issue is there can be edge cases (data source you cannot control and know there are structural issues) for which those parses may simply throw errors, and one has to resort to other hand crafted methods to be able to work with the data. It's not ideal, but at least one can get some progress. – Jerry May 22 '20 at 14:22
3

I found below is the easiest way, we can do it

string fruits = "Fruit,10,"Bananas, Oranges, Grapes"";
string[] fruitsArr = Regex.Split(fruits, ",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");

Output:

fruitsArr[0] = "Fruit"
fruitsArr[1] = "10"
fruitsArr[2] = "\"Bananas, Oranges, Grapes\""

If you need pure string data so you can do it something like,

fruitsArr[2].Replace("\"", "")

Jitendra G2
  • 1,196
  • 7
  • 14
1

A quick workaround could be pre-parse the commas inside the quotes and replace them with another delimiter, split the values and post-parse the values with the delimiter replacing it with the original commas.

SimonGiubs
  • 11
  • 2
0

if using c#, you can use

        string searchQuery = "Fruit,10,\"Bananas, Oranges, Grapes\"";
        List<string> list1 = Regex.Matches(searchQuery, @"(?<match>\w+)|\""(?<match>[\w\s,]*)""").Cast<Match>().Select(m => m.Groups["match"].Value).ToList();
        foreach(var v in list1)
        Console.WriteLine(v);

Output :

Fruit

10

Bananas, Oranges, Grapes

Praveen Singh
  • 29
  • 1
  • 3