8

I am not good in regex. Can some one help me out to write regex for me?

I may have values like this while reading csv file.

"Artist,Name",Album,12-SCS
"val""u,e1",value2,value3

Output:

Artist,Name  
Album
12-SCS
Val"u,e1 
Value2 
Value3

Update: I like idea using Oledb provider. We do have file upload control on the web page, that I read the content of the file using stream reader without actual saving file on the file system. Is there any way I can user Oledb provider because we need to specify the file name in connection string and in my case i don't have file saved on file system.

shailesh
  • 5,013
  • 5
  • 23
  • 22

7 Answers7

15

Just adding the solution I worked on this morning.

var regex = new Regex("(?<=^|,)(\"(?:[^\"]|\"\")*\"|[^,]*)");

foreach (Match m in regex.Matches("<-- input line -->"))
{
    var s = m.Value; 
}

As you can see, you need to call regex.Matches() per line. It will then return a MatchCollection with the same number of items you have as columns. The Value property of each match is, obviously, the parsed value.

This is still a work in progress, but it happily parses CSV strings like:

2,3.03,"Hello, my name is ""Joshua""",A,B,C,,,D
Joshua
  • 4,099
  • 25
  • 37
9

Actually, its pretty easy to match CVS lines with a regex. Try this one out:

StringCollection resultList = new StringCollection();
try {
    Regex pattern = new Regex(@"
        # Parse CVS line. Capture next value in named group: 'val'
        \s*                      # Ignore leading whitespace.
        (?:                      # Group of value alternatives.
          ""                     # Either a double quoted string,
          (?<val>                # Capture contents between quotes.
            [^""]*(""""[^""]*)*  # Zero or more non-quotes, allowing 
          )                      # doubled "" quotes within string.
          ""\s*                  # Ignore whitespace following quote.
        |  (?<val>[^,]*)         # Or... zero or more non-commas.
        )                        # End value alternatives group.
        (?:,|$)                  # Match end is comma or EOS", 
        RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
    Match matchResult = pattern.Match(subjectString);
    while (matchResult.Success) {
        resultList.Add(matchResult.Groups["val"].Value);
        matchResult = matchResult.NextMatch();
    } 
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Disclaimer: The regex has been tested in RegexBuddy, (which generated this snippet), and it correctly matches the OP test data, but the C# code logic is untested. (I don't have access to C# tools.)

ridgerunner
  • 33,777
  • 5
  • 57
  • 69
  • @viggity - Glad to help. You may also want to take a look at a more involved regex solution for parsing CSV lines -See: [How can I parse a CSV string with Javascript?](http://stackoverflow.com/a/8497474/433790) – ridgerunner Jul 31 '12 at 15:21
6

Regex is not the suitable tool for this. Use a CSV parser. Either the builtin one or a 3rd party one.

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • Agreed, regex is the wrong tool. I have used the CsvReader you linked to on Codeproject and found it to be great for handling csv files. – quentin-starin Jul 16 '10 at 20:43
  • I like idea using Oledb provider. We do have file upload control on the web page, that I read the content of the file using stream reader without actual saving file on the file system. Is there any way I can user Oledb provider because we need to specify the file name in connection string and in my case i don't have file saved on file system. – shailesh Jul 16 '10 at 22:01
  • That's a new question. Try asking a **new** question with the right title, context and tags. – BalusC Jul 19 '10 at 14:26
  • The built in one forces you to convert the values to .NET types. If it guesses a column wrong, it will lose the data. The 3rd party one has lots of bugs. `CsvReader` class in the 3rd party code is 2500 lines long and has lots of poorly-written functions, so debugging is a chore as well. Have fun! – Jake Sep 02 '10 at 00:14
  • +1 But, why don't you post that spiffy regex library on a an OSS host (ex github, google code). I can't download the source without a CodeProject account. – Evan Plaice May 02 '12 at 04:23
5

Give the TextFieldParser class a look. It's in the Microsoft.VisualBasic assembly and does delimited and fixed width parsing.

Brian Surowiec
  • 17,123
  • 8
  • 41
  • 64
  • +1 for TextFieldParser. It's one of the hidden gems of .NET - Possibly because it's hidden in the VisualBasic namespace for some reason. (P.S. *Always* follow the advice of a Brian S. Those guys are really smart!) – Brian Schroer Jul 17 '10 at 02:13
1

Give CsvHelper a try (a library I maintain). It's available via NuGet.

You can easily read a CSV file into a custom class collection. It's also very fast.

var streamReader = // Create a StreamReader to your CSV file
var csvReader = new CsvReader( streamReader );
var myObjects = csvReader.GetRecords<MyObject>();
Josh Close
  • 22,935
  • 13
  • 92
  • 140
-1

Regex might get overly complex here. Split the line on commas, and then iterate over the resultant bits and concatenate them where "the number of double quotes in the concatenated string" is not even.

"hello,this",is,"a ""test"""

...split...

"hello | this" | is | "a ""test"""

...iterate and merge 'til you've an even number of double quotes...

"hello,this" - even number of quotes (note comma removed by split inserted between bits)

is - even number of quotes

"a ""test""" - even number of quotes

...then strip of leading and trailing quote if present and replace "" with ".

Will A
  • 24,780
  • 5
  • 50
  • 61
-1

It could be done using below code:

using Microsoft.VisualBasic.FileIO;
string csv = "1,2,3,"4,3","a,"b",c",end";
TextFieldParser parser = new TextFieldParser(new StringReader(csv));
//To read from file
//TextFieldParser parser = new TextFieldParser("csvfile.csv");
parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");
string[] fields =null;
while (!parser.EndOfData)
{
    fields = parser.ReadFields();
}
parser.Close();
Nirupam
  • 152
  • 2
  • 4