-4

Although there are many posts on string splitting, I can't find something to address my problem. I need to split a string into an array, but the string has got delimiters on some fields (typically the values with separators in).

String looks something like:

John:"2016/10/15":"15:20:14":"Manager"

If I run:

string[] items = line.Split(':');

it splits the string into 6 items, whereas it should actually only 4.

Any way that the Split function can handle this?

EDIT: The 6 items is:

John
2016/10/15
"15
20
14"
"Manager"

I'm expecting the time (15:20:14) to be one item.

Cameron Castillo
  • 2,712
  • 10
  • 47
  • 77
  • 1
    What is the result when you split it? What are the 6 items? – Preston Martin Oct 13 '16 at 19:52
  • @PrestonM I'd guess the "extra" items are the time elements (minutes, seconds) that are split by ":". – Michael Armes Oct 13 '16 at 19:52
  • Based on your criteria, it should split into 6 items - you can use :" – Jivan Oct 13 '16 at 19:52
  • 1
    I don't think `Split` would be enough. Maybe you should try regular expressions? – Paweł Hemperek Oct 13 '16 at 19:54
  • This sounds more like CSV parsing than string splitting. May be someone can provide regex for that, I would go with a custom method. – Ivan Stoev Oct 13 '16 at 19:54
  • I don't think you have a way to just split. Split and merge the columns required – techspider Oct 13 '16 at 19:54
  • 1
    Seems to me that it splits on every colon. What are you actually trying to get back out? – maniak1982 Oct 13 '16 at 19:54
  • 1
    How did the split function remove the quotes? The 6 items are wrong. – Thomas Weller Oct 13 '16 at 19:55
  • @maniak1982 + PrestonM: question edited with the answers. – Cameron Castillo Oct 13 '16 at 19:56
  • @ThomasWeller - OP removed it ;) – techspider Oct 13 '16 at 19:56
  • There is no way those are the 6 items unless you have additional code that removes the " characters. – dmeglio Oct 13 '16 at 19:56
  • @ThomasWeller: Yes, you are right. I was to quick with the typing and missed the quotation marks. Will update. – Cameron Castillo Oct 13 '16 at 19:57
  • @Jivan: the given string is just an example. You can't tell `:"` works in all cases. I expect that "Manager" could also survive without quotes. – Thomas Weller Oct 13 '16 at 19:57
  • There is a general question about parsing CSV files that includes a reference to a github library of functions to handle this kind of task. This might be an option to consider. http://stackoverflow.com/questions/2081418/parsing-csv-files-in-c-with-header – David W Oct 13 '16 at 19:58
  • 1
    @CameronCastillo - How did you store double-quotes inside a string? Where is the escape character? – techspider Oct 13 '16 at 19:59
  • Do you literally have a single string like that? Or are you processing some kind of file of many rows? If one record, simply loop through and find a :. If you encounter a " you ignore any : until another " is encountered. If you're dealing with a situation where you may be processing hundreds/thousands of these, I agree with others, use a CSV parser. – dmeglio Oct 13 '16 at 20:02
  • @techspider, yes, there are an escape character. For readability i left it out as IMHO it does not impact the answer. – Cameron Castillo Oct 13 '16 at 20:06

2 Answers2

2

IMHO you need a parser with 2 states: inside quotes and outside quotes.

There exist libraries like Fast CSV Reader, which can be configured regarding the separator (:) and the quote character (") and even how the quote character can be escaped.

Thomas Weller
  • 55,411
  • 20
  • 125
  • 222
2

Since the first element isn't wrapped with quotes it really throws a wrench in making a clean split. If you don't want to use a third party library, this is making a few assumptions since I don't know what type of input string is valid in your case, i.e. what if the first element is just the colon?

public void GetElements()
    {
        var delimiter = ":";
        var myStr = "John:\"2016/10/15\":\"15:20:14\":\"Manager\"";

        //Split on quotes and remove elements from the array that are the delimter
        var elementArray = myStr.Split(new [] { '"'}, StringSplitOptions.RemoveEmptyEntries);
        elementArray = elementArray.Where(x => x != delimiter).ToArray();

        //Scrub the first element to remove the delimiter
        var firstElement = elementArray.ElementAt(0);
        elementArray[0] = firstElement.Remove(firstElement.Length - 1);

        foreach(var element in elementArray) Console.WriteLine(element);

        Console.ReadKey();
    }

The assumptions with this are:

  1. The first "element" in the string will always be unwrapped
  2. Any additional elements on the string would follow the current delimited style such as John:"2016/10/15":"15:20:14":"Manager":"My new string"
  3. Empty elements do not need to be retained
Ryan Intravia
  • 420
  • 6
  • 12