1

I would like to split the string below by using regex expression

Country:Subdivision, Level1:{Level2a:{Level3a, Level3b}, Level2b}

into form of

Country
   Subdivision
Level1
   Level2a
      Level3a
      Level3b
   Level2b

I knew there will be a recursive function to split to string into the above form.

I'm using .net, and want to split to string into a class

public class ListHierarchy
{
    public string Name { get; set; }
    public ListHierarchy ParentListHierarchy { get; set; }
}

The concept as below (Output):

var list1 = new ListHierarchy() { Name = "Country" };
var list2 = new ListHierarchy() { Name = "Subdivision", ParentListHierarchy = list1 };
var list3 = new ListHierarchy() { Name = "Level1" };
var list4 = new ListHierarchy() { Name = "Level2a", ParentListHierarchy = list3 };
var list5 = new ListHierarchy() { Name = "Level2b", ParentListHierarchy = list3 };
var list6 = new ListHierarchy() { Name = "Level3a", ParentListHierarchy = list4 };
var list7 = new ListHierarchy() { Name = "Level3b", ParentListHierarchy = list4 };

Guys, I have to solution already, but still need to fine tune on the regex

public static Dictionary<string, string> SplitToDictionary(string input, string regexString)
{
    Regex regex = new Regex(regexString);
    return regex.Matches(input).Cast<Match>().ToDictionary(x => x.Groups[1].Value.Trim(), x => x.Groups[2].Value.Trim());
}

string input = "Country:Subdivision, Level1:{Level2a:{Level3a:Level4a, Level3b}, Level2b}";

 var listHierarchy = new List<ListHierarchy>();
 Dictionary<string, string> listParent = SplitToDictionary(input, @"([\w\s]+):(([\w\s]+)|([\w\s\,\{\}\:]+))");

but, i getting

{Level2a:{Level3a, Level3b}, Level2b}

rather than

Level2a:{Level3a, Level3b}, Level2b 
Cœur
  • 37,241
  • 25
  • 195
  • 267
agent99
  • 11
  • 3
  • 3
    Trust me, you don't want to use a regex for this. You want a JSON parser. I'm pretty sure the language you're using (which you didn't specify) already has one. – Tim Pietzcker Feb 10 '12 at 07:56
  • 1
    Actually, since this isn't valid JSON, you don't want a JSON parser, but you'll need a [recursive descent parser](http://en.wikipedia.org/wiki/Recursive_descent_parser) nonetheless. – Tim Pietzcker Feb 10 '12 at 08:06
  • I creating a string to store my dynamic hierarchy list and json format is chosen for my string design. it is not a json but json alike string. – agent99 Feb 10 '12 at 08:09
  • If you really want to use regular expressions to parse nested braces check out this one with some options - http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns – Alexei Levenkov Feb 10 '12 at 09:14

2 Answers2

0

I love regular expressions, but for this problem they are just not the right tool.

Irony is an awesome and very easy to use library that will let you write a parser for your json-like thing.

It's free, open source, and the examples include a json parser that you can adapt to your needs.

Paolo Tedesco
  • 55,237
  • 33
  • 144
  • 193
-2

you can use this regex

([^\s,:{}])+

This would get you the Country Subdivision Level1 Level2a Level3a Level3b Level2b. you would have to put it into an array and then push it out according to your style.

EDIT

This would actively destroy the JSON hierarchy therefore is not recommended to be used for this question/situation. This would only return strings that can be stored in an array.

XepterX
  • 1,017
  • 6
  • 16
  • 1
    -1: Aside from the fact that it's not going to work, how should he be able to guess where to indent/outdent? – Tim Pietzcker Feb 10 '12 at 07:57
  • he was asking for the regex of getting the strings, so that is what i would suggest. i did not say how he is going to do the indent/outdent for the reason that i don't know what language he is going to use. Did you even read the part where i said how the OP styles it is up to him? so why the -1? – XepterX Feb 10 '12 at 08:00
  • 2
    By using a regex, you're actively *destroying* the information he needs for indenting/outdenting. – Tim Pietzcker Feb 10 '12 at 08:01
  • for the indent its means child of the parent – agent99 Feb 10 '12 at 08:02
  • he could use the match and split function together (if javascript) something like that to make it into an array an the style it accordingly. so how am i actively destroying the information? unless i am missing something here. – XepterX Feb 10 '12 at 08:10
  • Your regex isn't even doing what you say it does. If it was `([^\s,:{}]+)` you'd get a flattened array of all non-whitespace, non-punctuation words in the original string, with all the information about their hierarchical positions *removed*. – Tim Pietzcker Feb 10 '12 at 08:16
  • if it was the hierarchy position, then yes, you are right there, this would destroy that information. i only said this regex would get you the string, i did not say it would be a hierarchical type like JSON. so you could at least ask before you gave a -1. – XepterX Feb 10 '12 at 08:21
  • my answer is wrong, so i guess the -2 is okay. =) updated it to reflect the wrong answer – XepterX Feb 10 '12 at 08:25