1

I am using ParseCSV function to parse a CSV file in C#.

The last column in a row of CSV file contains: NM 120922C00002500(lots of spaces after this)

In ParseCSV function i am passing an inputstring, as a result of reading the CSV file.

A part of the inputstring is:

"1",000066,"07/30/2012","53193315D4","B ","99AAXXPB0"," "," "," ","CALL NM 09/22/12 00002.500 ","MG",100.00,1.050000,310,32550.00,25530.70,360,37800.00,30477.78,"C",2.50000,09/22/2012,"NM","NM 120922C00002500".

in the CSVParse function, am doing the following:

string csvParsingRegularExpressionOld = Prana.Global.ConfigurationHelper.Instance.GetAppSettingValueByKey("CSVParsingRegularExpression");
string csvParsingRegularExpression = csvParsingRegularExpressionOld.Replace("\\\\", "\\");

In csvParsingRegularExpression value comes out as:

((?<field>[^",\r\n]*)|"(?<field>([^"]|"")*)")(,|(?<rowbreak>\r\n|\n|$))

The I follow up with

Regex re = new Regex(csvParsingRegularExpression);

MatchCollection mc = re.Matches(inputString);

foreach (Match m in mc) 
{

   field = m.Result("${field}").Replace("\"\"", "\"");
}

But here field contains empty string when it comes to the last value "NM 120922C00002500". What may be the possible solution for this problem?

I dont know if there's a problem with the CSV file or with the regex method "Matches".

  • what is `csvParsingRegularExpressionOld` – Ria Aug 01 '12 at 10:06
  • Also, this is not the real code... `"\"` does not compile... – digEmAll Aug 01 '12 at 10:10
  • @digEmall "\"\"" compiles fine. It means a string that contains two double quotes. The two middle ones are escaped, thus part of the string. The last quote is not escaped, thus will mark the end of the string. – Tormod Aug 01 '12 at 10:26
  • 1
    @Tormod - @digEmAll means the `"\"` on the second line. – Alex Humphrey Aug 01 '12 at 10:39
  • 2
    Is it really _really_ necessary to use regular expressions to parse this csv? `string.split(',')`? [filehelpers](http://filehelpers.sourceforge.net/)? jet provider?Why of all possibilities regular expressions? – Daniel Aug 01 '12 at 11:57

3 Answers3

1

Don't use Regex to read CSV.

http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

codekaizen
  • 26,990
  • 7
  • 84
  • 140
0

You're not matching the last group because it ends with a period outside the quotes. If you add the period to the terminating group of your regex it works:

(\"?(?<field>[^",\r|\n]*)\"?\,?)*\.?(?<rowbreak>[\r|\n]*)

Although as other comments have pointed out, it's not a great idea to roll your own parser if the data is really valid CSV (I did't bother to check whether the given sample matches the spec). There are plenty of parsers available and you're likely to miss some edge cases.

Community
  • 1
  • 1
Tyson
  • 1,685
  • 15
  • 36
  • Try the updated version, it should work. I tested it using http://regexpal.com/ by removing group names, since the tool doesn't support them. If you plug in (\"?([^",\r|\n]*)\"?\,?)*\.?([\r|\n]*) it should work. – Tyson Aug 06 '12 at 15:07
0

If you don't absolutely want to use regex, here is a small class I made, followed by it's usage :

public class ParseHelper
{
    public char TextDelimiter { get; set; }
    public char TextQualifier { get; set; }
    public char EscapeCharacter { get; set; }

    public List<string> Parse(string str, bool keepTextQualifiers = false)
    {
        List<string> returnedValues = new List<string>();

        bool inQualifiers = false;
        string currentWord = "";

        for (int i = 0; i < str.Length; i++)
        {
            //Looking for EscapeCharacter.
            if (str[i] == EscapeCharacter)
            {
                i++;
                currentWord += str[i];
                continue;
            }

            //Looking for TextQualifier.
            if (str[i] == TextQualifier)
            {
                if (keepTextQualifiers)
                    currentWord += TextQualifier;

                inQualifiers = !inQualifiers;
                continue;
            }

            //Looking for TextDelimiter.
            if (str[i] == TextDelimiter && !inQualifiers)
            {
                returnedValues.Add(currentWord);
                currentWord = "";
                continue;
            }

            currentWord += str[i];
        }

        if (inQualifiers)
            throw new FormatException("The input string, 'str', is not properly formated.");

        returnedValues.Add(currentWord);
        currentWord = "";

        return returnedValues;
    }
}

Usage, based on your case :

ParseHelper ph = new ParseHelper() {
    TextDelimiter = ',',
    TextQualifier = '"',
    EscapeCharacter = '\'};
List<string> parsedLine = ph.Parse(unparsedLine);
Tipx
  • 7,367
  • 4
  • 37
  • 59