3

I have tried a regular expression to split a string with comma and space. Expression matches all the cases except only one. The code I have tried is:

        List<string> strNewSplit = new List<string>();
        Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
        foreach (Match match in csvSplit.Matches(input))
        {
            strNewSplit.Add(match.Value.TrimStart(','));
        }
        return strNewSplit;

CASE1: "MYSQL,ORACLE","C#,ASP.NET"

EXpectedOutput:

"MYSQL,ORACLE"

"C#,ASP.NET"

RESULT : PASS

CASE2: "MYSQL,ORACLE", "C#,ASP.NET"

ExpectedOutput:

"MYSQL,ORACLE"

"C#,ASP.NET"

Actual OutPut:

"MYSQL,ORACLE"

"C#

ASP.NET"

RESULT: FAIL.

If I provide a space after a comma in between two DoubleQuotes then I didn't get appropriate output. Am I missing anything? Please provide a better solution.

Matt Tester
  • 4,663
  • 4
  • 29
  • 32
  • Maybe you can adapt this solution http://stackoverflow.com/questions/9169514/regular-expression-to-split-by-comma-ignores-comma-within-double-quotes-vb-ne – Miguel Ribeiro Jul 06 '12 at 13:49

1 Answers1

1

I normally write down the EBNF of my Input to parse.

In your case I would say:

List = ListItem {Space* , Space* ListItem}*;

ListItem = """ Identifier """; // Identifier is everything whitout "

Space = [\t ]+;

Which means a List consists of a ListItem that is folled by zero or mutliple (*) ListItems that are separated with spaces a comma and again spaces.

That lead me to the following (you are searching for ListItems):

static void Main(string[] args)
{
    matchRegex("\"MYSQL,ORACLE\",\"C#,ASP.NET\"").ForEach(Console.WriteLine);
    matchRegex("\"MYSQL,ORACLE\", \"C#,ASP.NET\"").ForEach(Console.WriteLine);
}
static List<string> matchRegex(string input)
{
    List<string> strNewSplit = new List<string>();
    Regex csvSplit = new Regex(
        "(\"(?:[^\"]*)\")"
        , RegexOptions.Compiled);
    foreach (Match match in csvSplit.Matches(input))
    {
       strNewSplit.Add(match.Value.TrimStart(','))
    }
    return strNewSplit;
}

Which returns what you wanted. Hope I understood you correctly.

FSaccilotto
  • 656
  • 8
  • 13
  • thanks it works good. and i have add another senario that if my input is like C#, "asp.net,SQLSERVER" then im expecting an output like C# and "asp.net,SQLSERVER" seperately. but you suggestion provide only "asp.net,SQLSERVER" and ommited C#. thanks for ur continues support – Venkat Narayan May 17 '12 at 06:28
  • im looking for actually to split with comma and space. if string enclosed with double codes then that should be treated seperatly like my input is C#, asp.net, "sqlserver,Linux java" then output should be C# then asp.net then "sqlserver,Linux java" seperately. i hope i have clearly mentiond my problem – Venkat Narayan May 17 '12 at 06:35
  • As I said, try to write down an EBNF of all variants of you input. I you don't achieve to build an ebnf, you will possibly not be able to regex in one step. You could also replace all commas in " " from the lists with semicolons and then you can first split with comma afterwards with semicolon. – FSaccilotto May 17 '12 at 06:37