i am trying to read strings in C# from files and its typical contents are:
17401 "71690090" "HSG" "3384656" 0 "condensate pipe for boiler leaking" "avoid school run 2.30 -4 call route 07777766777" "" "07777766777" "07777766777 " 0 "0" "24 Hour" "YYYYYN" "YYYYYN" ? "00:00" 0 "H1" "Domestic-Repair" "HR" 0 "" ? "00:00" "Tom Timmy" 22/03/2017 "08:16" 22/03/2017 "08:18" 23/03/2017 "08:18" "2010" "Some Company" 2010 "Some Company Lot4" "" "Miss L Burton||90||Mount Pleasant" "Mount Pleasant|Pleasantville||XX1 1XX" "" "" ""
it contains mixture of:-
1 - alphanumeric with special characters (grouped - no spaces)
2 - string contained within quotations (which has spaces and special characters), some may be empty quotations.
i am trying to split up the above string which going off the count is 42, then putting this into array of string. i have come up with:
("[a-zA-Z0-9 .:+*,#/'~@;=+_)(&^%$£!`¬|-]+")|("")|([?])|((\d{2}\/\d{2}\/\d{4})|(\d+))
which i created on Regex101.com however when i try to put into c# as:
string[] test1 = Regex.Split(line, @"(""[a-zA-Z0-9 .:+*,#/'~@;=+_)(&^%$£!`¬|-]+"")|("""")|([?])|((\d{2}\/\d{2}\/\d{4})|(\d+))");
i get 94 as the count of items in test1
, i am trying to replicate from Regex101.com so it splits into 42.
can someone kindly please point me in the right direction?
also if another efficient way compared to my approach?
solved:
var pattern = @"(""(?<value>[^""]+)""|(?<value>[^\s]+))";
var regex = new Regex(pattern, RegexOptions.Compiled);
string[] test1 = regex.Matches(line).Cast<Match>().Select(m => m.Value).ToArray();
i didn't think i needed to over complicate using CSV parser.