9

I'd like to convert string containing recursive array of strings to an array of depth one.

Example:

StringToArray("[a, b, [c, [d, e]], f, [g, h], i]") == ["a", "b", "[c, [d, e]]", "f", "[g, h]", "i"]

Seems quite simple. But, I come from functional background and I'm not that familiar with .NET Framework standard libraries, so every time (I started from scratch like 3 times) I end up just plain ugly code. My latest implementation is here. As you see, it's ugly as hell.

So, what's the C# way to do this?

dijxtra
  • 2,681
  • 4
  • 25
  • 37
  • 1
    +1 for a challenging problem. However, I think this is typically for codereview: codereview.stackexchange.com/faq#questions. – Gert Arnold Nov 01 '11 at 08:14

5 Answers5

6

@ojlovecd has a good answer, using Regular Expressions.
However, his answer is overly complicated, so here's my similar, simpler answer.

public string[] StringToArray(string input) {
    var pattern = new Regex(@"
        \[
            (?:
            \s*
                (?<results>(?:
                (?(open)  [^\[\]]+  |  [^\[\],]+  )
                |(?<open>\[)
                |(?<-open>\])
                )+)
                (?(open)(?!))
            ,?
            )*
        \]
    ", RegexOptions.IgnorePatternWhitespace);

    // Find the first match:
    var result = pattern.Match(input);
    if (result.Success) {
        // Extract the captured values:
        var captures = result.Groups["results"].Captures.Cast<Capture>().Select(c => c.Value).ToArray();
        return captures;
    }
    // Not a match
    return null;
}

Using this code, you will see that StringToArray("[a, b, [c, [d, e]], f, [g, h], i]") will return the following array: ["a", "b", "[c, [d, e]]", "f", "[g, h]", "i"].

For more information on the balanced groups that I used for matching balanced braces, take a look at Microsoft's documentation.

Update:
As per the comments, if you want to also balance quotes, here's a possible modification. (Note that in C# the " is escaped as "") I also added descriptions of the pattern to help clarify it:

    var pattern = new Regex(@"
        \[
            (?:
            \s*
                (?<results>(?:              # Capture everything into 'results'
                    (?(open)                # If 'open' Then
                        [^\[\]]+            #   Capture everything but brackets
                        |                   # Else (not open):
                        (?:                 #   Capture either:
                            [^\[\],'""]+    #       Unimportant characters
                            |               #   Or
                            ['""][^'""]*?['""] #    Anything between quotes
                        )  
                    )                       # End If
                    |(?<open>\[)            # Open bracket
                    |(?<-open>\])           # Close bracket
                )+)
                (?(open)(?!))               # Fail while there's an unbalanced 'open'
            ,?
            )*
        \]
    ", RegexOptions.IgnorePatternWhitespace);
Scott Rippey
  • 15,614
  • 5
  • 70
  • 85
  • This is a fantastic solution. :) – ojlovecd Nov 02 '11 at 00:37
  • Thanks, hope I didn't steal your thunder :) – Scott Rippey Nov 02 '11 at 00:45
  • Certainly not. There is only discussion and improvement. :) – ojlovecd Nov 02 '11 at 00:53
  • Your solution is beautiful. I finally managed to find time and study it, and it really is fantastic. Only one problem: now I'd like strings also to be preserved as atomic entities even if they contain lists (so: "[a, \"b, [c, d]\", e]" => ["a", "b, [c, d]", "e"]), but quotes do not have distinction between opening and closing quote, so balancing groups do not work. Do you have an elegant solution to this also? :-) – dijxtra Nov 23 '11 at 11:24
  • Balanced quotes -- way to throw a wrench into the engine! [I did this in JavaScript recently](https://gist.github.com/1349099), but that solution used Regex as a parsing engine, so it wasn't a pure Regex solution. It might be interesting to take a look. – Scott Rippey Nov 23 '11 at 17:34
  • I added to my answer my attempt at balanced quotes. – Scott Rippey Nov 23 '11 at 17:50
2

with Regex, it can solve your problem:

static string[] StringToArray(string str)
{
    Regex reg = new Regex(@"^\[(.*)\]$");
    Match match = reg.Match(str);
    if (!match.Success)
        return null;
    str = match.Groups[1].Value;
    List<string> list = new List<string>();
    reg = new Regex(@"\[[^\[\]]*(((?'Open'\[)[^\[\]]*)+((?'-Open'\])[^\[\]]*)+)*(?(Open)(?!))\]");
    Dictionary<string, string> dic = new Dictionary<string, string>();
    int index = 0;
    str = reg.Replace(str, m =>
    {
        string temp = "ojlovecd" + (index++).ToString();
        dic.Add(temp, m.Value);
        return temp;
    });
    string[] result = str.Split(',');
    for (int i = 0; i < result.Length; i++)
    {
        string s = result[i].Trim();
        if (dic.ContainsKey(s))
            result[i] = dic[s].Trim();
        else
            result[i] = s;
    }
    return result;
}
ojlovecd
  • 4,812
  • 1
  • 20
  • 22
  • I also think Regex is the way to , but this won't work, because you need to capture "balanced" braces. – Scott Rippey Nov 01 '11 at 04:52
  • @ScottRippey Hi, Scott, I modified my code, have a try please. – ojlovecd Nov 01 '11 at 05:43
  • That looks good. Needs some clean up, but I'll assume it works :) For anyone else interested in these "balancing groups", especially for balanced brace matching, you should take a look at [Microsoft's documentation on "Balancing Group Definitions".](http://msdn.microsoft.com/en-us/library/bs2twtah.aspx#balancing_group_definition) – Scott Rippey Nov 01 '11 at 21:40
0

Honestly I would just write this method in an F# assembly as its probably much easier. If you look at the JavaScriptSerializer implementation in C# (with a decompiler like dotPeek or reflector) you can see how messy the array parsing code is for a similar array in JSON. Granted this has to handle a much more varied array of tokens, but you get the idea.

Here is their DeserializeList implementation, uglier than it is normally as its dotPeek's decompiled version, not the original, but you get the idea. The DeserializeInternal would recurse down to the child list.

private IList DeserializeList(int depth)
{
  IList list = (IList) new ArrayList();
  char? nullable1 = this._s.MoveNext();
  if (((int) nullable1.GetValueOrDefault() != 91 ? 1 : (!nullable1.HasValue ? 1 : 0)) != 0)
    throw new ArgumentException(this._s.GetDebugString(AtlasWeb.JSON_InvalidArrayStart));
  bool flag = false;
  char? nextNonEmptyChar;
  char? nullable2;
  do
  {
    char? nullable3 = nextNonEmptyChar = this._s.GetNextNonEmptyChar();
    if ((nullable3.HasValue ? new int?((int) nullable3.GetValueOrDefault()) : new int?()).HasValue)
    {
      char? nullable4 = nextNonEmptyChar;
      if (((int) nullable4.GetValueOrDefault() != 93 ? 1 : (!nullable4.HasValue ? 1 : 0)) != 0)
      {
        this._s.MovePrev();
        object obj = this.DeserializeInternal(depth);
        list.Add(obj);
        flag = false;
        nextNonEmptyChar = this._s.GetNextNonEmptyChar();
        char? nullable5 = nextNonEmptyChar;
        if (((int) nullable5.GetValueOrDefault() != 93 ? 0 : (nullable5.HasValue ? 1 : 0)) == 0)
        {
          flag = true;
          nullable2 = nextNonEmptyChar;
        }
        else
          goto label_8;
      }
      else
        goto label_8;
    }
    else
      goto label_8;
  }
  while (((int) nullable2.GetValueOrDefault() != 44 ? 1 : (!nullable2.HasValue ? 1 : 0)) == 0);
  throw new ArgumentException(this._s.GetDebugString(AtlasWeb.JSON_InvalidArrayExpectComma));
 label_8:
  if (flag)
    throw new ArgumentException(this._s.GetDebugString(AtlasWeb.JSON_InvalidArrayExtraComma));
  char? nullable6 = nextNonEmptyChar;
  if (((int) nullable6.GetValueOrDefault() != 93 ? 1 : (!nullable6.HasValue ? 1 : 0)) != 0)
    throw new ArgumentException(this._s.GetDebugString(AtlasWeb.JSON_InvalidArrayEnd));
  else
    return list;
}

Recursive parsing is just not managed as well though in C# as it is in F#.

Paul Tyng
  • 7,924
  • 1
  • 33
  • 57
0

There is no real "standard" way of doing this. Note that the implementation can get pretty messy if you want to consider all possibilities. I would recommend something recursive like:

    private static IEnumerable<object> StringToArray2(string input)
    {
        var characters = input.GetEnumerator();
        return InternalStringToArray2(characters);
    }

    private static IEnumerable<object> InternalStringToArray2(IEnumerator<char> characters)
    {
        StringBuilder valueBuilder = new StringBuilder();

        while (characters.MoveNext())
        {
            char current = characters.Current;

            switch (current)
            {
                case '[':
                    yield return InternalStringToArray2(characters);
                    break;
                case ']':
                    yield return valueBuilder.ToString();
                    valueBuilder.Clear();
                    yield break;
                case ',':
                    yield return valueBuilder.ToString();
                    valueBuilder.Clear();
                    break;
                default:
                    valueBuilder.Append(current);
                    break;
            }

Although your not restricted to recursiveness and can always fall back to a single method like

    private static IEnumerable<object> StringToArray1(string input)
    {
        Stack<List<object>> levelEntries = new Stack<List<object>>();
        List<object> current = null;
        StringBuilder currentLineBuilder = new StringBuilder();

        foreach (char nextChar in input)
        {
            switch (nextChar)
            {
                case '[':
                    levelEntries.Push(current);
                    current = new List<object>();
                    break;
                case ']':
                    current.Add(currentLineBuilder.ToString());
                    currentLineBuilder.Clear();
                    var last = current;
                    if (levelEntries.Peek() != null)
                    {
                        current = levelEntries.Pop();
                        current.Add(last);
                    }
                    break;
                case ',':
                    current.Add(currentLineBuilder.ToString());
                    currentLineBuilder.Clear();
                    break;
                default:
                    currentLineBuilder.Append(nextChar);
                    break;
            }
        }

        return current;
    }

Whatever smells good to you

Polity
  • 14,734
  • 2
  • 40
  • 40
0
using System;
using System.Text;
using System.Text.RegularExpressions;
using Microsoft.VisualBasic.FileIO; //Microsoft.VisualBasic.dll
using System.IO;

public class Sample {
    static void Main(){
        string data = "[a, b, [c, [d, e]], f, [g, h], i]";
        string[] fields = StringToArray(data);
        //check print
        foreach(var item in fields){
            Console.WriteLine("\"{0}\"",item);
        }
    }
    static string[] StringToArray(string data){
        string[] fields = null;
        Regex innerPat = new Regex(@"\[\s*(.+)\s*\]");
        string innerStr = innerPat.Matches(data)[0].Groups[1].Value;
        StringBuilder wk = new StringBuilder();
        var balance = 0;
        for(var i = 0;i<innerStr.Length;++i){
            char ch = innerStr[i];
            switch(ch){
            case '[':
                if(balance == 0){
                    wk.Append('"');
                }
                wk.Append(ch);
                ++balance;
                continue;
            case ']':
                wk.Append(ch);
                --balance;
                if(balance == 0){
                    wk.Append('"');
                }
                continue;
            default:
                wk.Append(ch);
                break;
            }
        }
        var reader = new StringReader(wk.ToString());
        using(var csvReader = new TextFieldParser(reader)){
            csvReader.SetDelimiters(new string[] {","});
            csvReader.HasFieldsEnclosedInQuotes = true;
            fields = csvReader.ReadFields();
        }
        return fields;
    }
}
BLUEPIXY
  • 39,699
  • 7
  • 33
  • 70