2

How can I split string (from a textbox) by commas excluding those in double quotation marks (without getting rid of the quotation marks), along with other possible punctuation marks (e.g. ' . ' ' ; ' ' - ')?

E.g. If someone entered the following into the textbox:

apple, orange, "baboons, cows", rainbow, "unicorns, gummy bears"

How can I split the above string into the following (say, into a List)?

apple

orange

"baboons, cows"

rainbow

"Unicorns, gummy bears..."

Thank you for your help!

dbc
  • 104,963
  • 20
  • 228
  • 340
MooMooCoding
  • 319
  • 1
  • 5
  • 15
  • If processing CSV files, it's better to use an existing library, such as LinqToCSV (there are others as well), rather than rolling your own. – hatchet - done with SOverflow Aug 24 '14 at 12:13
  • Thank you @hatchet for your suggestion, but I'm actually trying to ask the user for search terms (to query a database). – MooMooCoding Aug 24 '14 at 12:17
  • Note: space is not a punctuation mark, but your expected results drop the space too. Don't forget about that in your code. And at least one of the current answers assumes that *every* comma will be followed by a space, and actually splits on `", "`. Beware that this does not work for `apple,orange`. Also, how should `apple,orange"banana,peach"almond,kiwi` be split, when `"` does not appear anywhere near a comma? –  Aug 24 '14 at 13:11
  • Or... since you say this is for search terms, why require the user to type commas at all? Will the user expect that searching for `a b c` searches for that exact phrase, and the user has to type `a, b, c` instead to search for those words? It depends on the user; you should probably double-check that this is indeed what your users expect, or change the logic. –  Aug 24 '14 at 13:14
  • Thank you very much for your suggestions and advice, @hvd! – MooMooCoding Aug 24 '14 at 13:28

6 Answers6

4

You could try the below regex which uses positive lookahead,

string value = @"apple, orange, ""baboons, cows"", rainbow, ""unicorns, gummy bears""";
string[] lines = Regex.Split(value, @", (?=(?:""[^""]*?(?: [^""]*)*))|, (?=[^"",]+(?:,|$))");

foreach (string line in lines) {
Console.WriteLine(line);
}

Output:

apple
orange
"baboons, cows"
rainbow
"unicorns, gummy bears"

IDEONE

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • 1
    if you want to split according to `,.:-` then use a character class `[,.:-] (?=(?:""[^""]*?(?: [^""]*)*))|[,.:-] (?=[^"",]+(?:,|$))` – Avinash Raj Aug 24 '14 at 13:34
  • No worries, @Avinash. I was trying on all of these nice suggestions to see which one is more suitable in my case. – MooMooCoding Aug 24 '14 at 14:19
1

Try this:

Regex str = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);

foreach (Match m in str.Matches(input))
{
    Console.WriteLine(m.Value.TrimStart(','));
}

You may also try to look at FileHelpers

Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
  • Thank you @R.T. but it only works if the quotation mark is immediately after a comma (i.e. not separated by a space). Sorry but I'm not familiar with regex! – MooMooCoding Aug 24 '14 at 12:34
1

Much like a CSV parser, instead of Regex, you can loop through each character, like so:

public List<string> ItemStringToList(string inputString)
{  
    var itemList   = new List<string>();
    var currentIem = "";
    var quotesOpen = false;

    for (int i = 0; i < inputString.Length; i++)
    {
        if (inputString[i] == '"')
        {
            quotesOpen = !quotesOpen;
            continue;
        }

        if (inputString[i] == ',' && !quotesOpen)
        {
            itemList.Add(currentIem);
            currentIem = "";
            continue;
        }

        if (currentIem == "" && inputString[i] == ' ') continue;
        currentIem += inputString[i];
    }

    if (currentIem != "") itemList.Add(currentIem);

    return itemList;
}

Example test usage:

var test1 = ItemStringToList("one, two, three");
var test2 = ItemStringToList("one, \"two\", three");
var test3 = ItemStringToList("one, \"two, three\"");
var test4 = ItemStringToList("one, \"two, three\", four, \"five six\", seven");
var test5 = ItemStringToList("one, \"two, three\", four, \"five six\", seven");
var test6 = ItemStringToList("one, \"two, three\", four, \"five six, seven\"");
var test7 = ItemStringToList("\"one, two, three\", four, \"five six, seven\"");

You could change it to use StringBuilder if you want faster character joining.

WhoIsRich
  • 4,053
  • 1
  • 33
  • 19
  • 1
    Nice one. BTW: I'd use an [iterator method](http://msdn.microsoft.com/en-us/library/dscyy5s0.aspx) and `yield` instead of `itemList.Add`. That would allow you to get rid of `itemList` (less code) and have deferred execution for free as an additional benefit. – Heinzi Aug 24 '14 at 13:09
0

Try with this it will work u c an split array string in many waysif you want to split by white space just put a space in (' ') .

  namespace LINQExperiment1
  {
  class Program
  {
  static void Main(string[] args)
  {
   string[] sentence = new string[] { "apple", "orange", "baboons  cows", " rainbow", "unicorns  gummy bears" };

  Console.WriteLine("option 1:"); Console.WriteLine("————-");
  // option 1: Select returns three string[]’s with
  // three strings in each.
  IEnumerable<string[]> words1 =
  sentence.Select(w => w.Split(' '));
  // to get each word, we have to use two foreach loops
  foreach (string[] segment in words1)
  foreach (string word in segment)
  Console.WriteLine(word);
  Console.WriteLine();
  Console.WriteLine("option 2:"); Console.WriteLine("————-");
  // option 2: SelectMany returns nine strings
  // (sub-iterates the Select result)
  IEnumerable<string> words2 =
  sentence.SelectMany(segment => segment.Split(','));
  // with SelectMany we have every string individually
  foreach (var word in words2)
  Console.WriteLine(word);
  // option 3: identical to Opt 2 above written using
  // the Query Expression syntax (multiple froms)
  IEnumerable<string> words3 =from segment in sentence
  from word in segment.Split(' ')
  select word;
   }
  }
 }
0

This was trickier than I thought, a good practical problem I think.

Below is the solution I came up with for this. One thing I don't like about my solution is having to add double quotations back and the other one being names of the variables :p:

internal class Program
{
    private static void Main(string[] args)
    {

        string searchString =
            @"apple, orange, ""baboons, cows. dogs- hounds"", rainbow, ""unicorns, gummy bears"", abc, defghj";

        char delimeter = ',';
        char excludeSplittingWithin = '"';

        string[] splittedByExcludeSplittingWithin = searchString.Split(excludeSplittingWithin);

        List<string> splittedSearchString = new List<string>();

        for (int i = 0; i < splittedByExcludeSplittingWithin.Length; i++)
        {
            if (i == 0 || splittedByExcludeSplittingWithin[i].StartsWith(delimeter.ToString()))
            {
                string[] splitttedByDelimeter = splittedByExcludeSplittingWithin[i].Split(delimeter);
                for (int j = 0; j < splitttedByDelimeter.Length; j++)
                {
                    splittedSearchString.Add(splitttedByDelimeter[j].Trim());
                }
            }
            else
            {
                splittedSearchString.Add(excludeSplittingWithin + splittedByExcludeSplittingWithin[i] +
                                         excludeSplittingWithin);
            }
        }

        foreach (string s in splittedSearchString)
        {
            if (s.Trim() != string.Empty)
            {
                Console.WriteLine(s);
            }
        }
        Console.ReadKey();
    }
}
haku
  • 4,105
  • 7
  • 38
  • 63
  • Thank you for your help, @HakuKalay. It's nice and rather rewarding to see different people's approach to the same problem. The theory behind some of the solutions can also be used in tackling other kinds of problems! – MooMooCoding Aug 24 '14 at 16:36
0

Another Regex solution:

private static IEnumerable<string> Parse(string input)
{
  // if used frequently, should be instantiated with Compiled option
  Regex regex = new Regex(@"(?<=^|,\s)(\""(?:[^\""]|\""\"")*\""|[^,\s]*)");

  return regex.Matches(inputData).Where(m => m.Success);
}
Igor Pashchuk
  • 2,455
  • 2
  • 22
  • 29