2

We want to validate input .DAT file records for provided deliminators.

In our .net application we can parse input file as per provided deliminator where all deliminator are proper , ex : "Test","data","CaseInforation"

above record/row from file will parse successfully,now we have problem with row/record formatted as below:

"Test",data,"CaseInforation" (u can see there is no text qualifier surrounded with 'data' value & because of this it creates problem while parsing data from file).

So we decided to use regular expression to find problematic value which are not surrounded with TextQualifier.

To Resolve this problem, we have created below RegEx to find problematic value, \x2C([^\x22].*?[^\x22])\x2C

using above regular expression, it works in records in between first & last fields.

"Test",data,"CaseInforation" -> Regular expression parse this records successfully & providing data as output field having problem.

"Test","data",CaseInforation -> for this record regular expression does not match last value.

can any one help us to correct Regular Expression which would match first or last value.

Thanks.

SirDarius
  • 41,440
  • 8
  • 86
  • 100
Annaya
  • 21
  • 1
  • 2

3 Answers3

3

^(?:(?:"((?:""|[^"])+)"|([^,]*))(?:$|,))+$ Will match the whole line, then you can use match.Groups[1].Captures to get your data out (without the quotes). Also, I let "My name is ""in quotes""" be a valid string.

string mystring = "\"Test\",\"data\",\"CaseInforation\"";
MatchCollection matches = Regex.Matches(mystring, "^(?:(?:\"((?:\"\"|[^\"])+)\"|([^,]*))(?:$|,))+$");
match[0].Value = "Test","data","CaseInforation"
match[0].Groups[0].Value => "Test","data","CaseInforation"
match[0].Groups[0].Captures[0].Value => "Test","data","CaseInforation"
match[0].Groups[1].Value => CaseInforation
match[0].Groups[1].Captures[0].Value => Test
match[0].Groups[1].Captures[1].Value => data
match[0].Groups[1].Captures[2].Value => CaseInforation
agent-j
  • 27,335
  • 5
  • 52
  • 79
0

Something along these lines?

/^"\w+","?(.+)"?,"\w+"$/
ian
  • 12,003
  • 9
  • 51
  • 107
0

a simple [^\",]+ should give you one match for each value as long any "'s and ,'s are between values. And if there are any inside a value, that value will just be split into two.

so something like this:

foreach(Match match in Regex.Matches(data, "[^\",]+"))
{
    Console.WriteLine(match.Value);//or whatever
}

Though if you have "Test",data,"CaseIn"foration" you would get Test, data, CaseIn and foration out.

Martin Brenden
  • 257
  • 2
  • 8
  • Given regEx works fine to match string in quote.but is there any way we can directly get value which not matched for provided string.ex. "ID","File Name","Subject",From (for this provided string regex matches all values contains "" that mean "ID","File Name","Subject") now is there any way we just get not matched sting that is From in above string – Annaya Jun 22 '11 at 09:39