-1

Here is an example of a string that I am working with

{Hi|Hello|Holla} {James{ey|o|ing}|Bob{bie|bey}}

I need a regular expression to extract the values between the {}'s example:

Hi|Hello|Holla
James{ey|o|ing}
Bob{bie|bey}

The original string is called Spintax. My program will select a random value enclosed within each {} block. The nested {} blocks can go pretty deep.

The regular expression needs to extract the value between the {} ignoring any nested {} blocks. And then, split the value by the pipe (|) again ignoring any nested {} blocks so that the pipes within nested {} blocks are not touched.

Does that make sense?

I did implement partial solution using String methods, but when splitting by pipes it splits the pipes within the nested {} too, which is to be expected, but I can't figure out a way to ignore the nested {}

public String spintaxParse(String s)
    {
        // TODO: Implement logic to check for {} within String.
        if (s.Contains('{'))
        {
            int firstOccuranceOfOpenBrace = s.IndexOf('{');

            while (s[firstOccuranceOfOpenBrace + 1].Equals('{'))
                firstOccuranceOfOpenBrace++;

            int firstOccuranceOfClosingBrace = s.Substring(firstOccuranceOfOpenBrace).IndexOf('}');

            String spintaxBlock = s.Substring(firstOccuranceOfOpenBrace, firstOccuranceOfClosingBrace + 1);

            String[] items = spintaxBlock.Substring(1, spintaxBlock.Length - 2).Split('|');

            Random rand = new Random();

            s = s.Replace(spintaxBlock, items[rand.Next(items.Length)]);

            return spintaxParse(s);
        }
        else
        {
            return s;
        }
    }
competent_tech
  • 44,465
  • 11
  • 90
  • 113
Sian Jakey Ellis
  • 435
  • 1
  • 4
  • 13
  • You may want to add the [c#] tag to this question, since it seems to be based on [your previous question](http://stackoverflow.com/questions/8004465/spintax-c-sharp-how-can-i-handle-this). It also makes it clear which programming language you want a regex for, as different platforms have different regex implementations. – BoltClock Nov 04 '11 at 05:37
  • Your question is inconsistent. You want to split all level 1 and level 2 patterns but not level 3? – Polity Nov 04 '11 at 06:15
  • Agree with Polity. According to your description, the example string here should be split to
    Hi
    Hello
    Holla
    James{ey|o|ing}
    Bob{bie|bey}
    is a seperator
    – ojlovecd Nov 04 '11 at 06:38
  • Check the code. It's a recursive function that finds the first set of {} and replaces it with a random value from within the {} block. The string is then passed to the function, and a second set is found, etc etc till all are replaced. I can extract the values within an {} with my solution above, but when I split the values by the pipe delimiter, if the string contains a nested {} then the pipes within that block get split, which is what I don't want. I want to split levels 1..n – Sian Jakey Ellis Nov 04 '11 at 06:38
  • Google Spintax for a definition of what it is, and you'll understand what I mean. – Sian Jakey Ellis Nov 04 '11 at 06:39
  • I solved the problem in my other question ... http://stackoverflow.com/questions/8004465/spintax-c-sharp-how-can-i-handle-this – Sian Jakey Ellis Nov 04 '11 at 08:14

2 Answers2

1

Since you are dealing with multi-nested syntax, I think you might want to create a simple parser using Parser Generation Tool, sucn as "ANTLR". ANTLR Link

the ANTLR syntax should be something like this:

statements: statement+
     ;
statement: '{'+ content + '}'
     ;
content: token
     | TOKEN + '|' + content
     | TOKEN + '|' + statement
     ;

TOKEN: \w+
     ;
hygoh2k
  • 2,060
  • 2
  • 12
  • 7
  • I'd like to possibly use a regular expression to say "If there is a {...} block within this string of pipe delimited values do not split the values within that nested block". – Sian Jakey Ellis Nov 04 '11 at 06:42
0

It could be easier to parse string by hand or with some parser generator.

For regular expressions to match balanced braces check out this answer - Regular expression for String.Format-like utility and related MSDN article http://msdn.microsoft.com/en-us/library/bs2twtah.aspx#balancing_group_definition .

Community
  • 1
  • 1
Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179