0

I have a statement string like this:

    *
    | { table_name | view_name | table_alias }.*
    | {
        [ { table_name | view_name | table_alias }. ]
        { column_name | $IDENTITY | $ROWGUID }
        | udt_column_name [ { . | :: } { { property_name | field_name } | method_name ( argument [ ,...n] ) } ]
        | expression
        [ [ AS ] column_alias ]
      }
    | column_alias = expression 

I need only the outermost items,so I use char | to split the content, I want to exclude any the | exist in brackets.
The result of the split is that it has 4 items, like this:

#1 *
#2 { table_name | view_name | table_alias }.*
#3 { [ { table_name | view_name | table_alias }. ] { column_name | $IDENTITY | $ROWGUID } | udt_column_name [ { . | :: } { { property_name | field_name } | method_name ( argument [ ,...n] ) } ] | expression [ [ AS ] column_alias ] }

#4 column_alias = expression

I tried some like (?m)\s*^\|\s* or ^(({\|\s*})({\{})?)({.+})$ but that just get me ONE item not FOUR items.
Thanks for @Wiktor Stribiżew and @Rui Jarimba help.

I has idea (?<!\{[^\}]*)\|(?![^\{]*\}) and I get like this:

#1 *
#2 { table_name | view_name | table_alias }.*
#3

 {
                [ { table_name | view_name | table_alias }. ]
                { column_name | $IDENTITY | $ROWGUID }

#4

udt_column_name [ { . | :: } { { property_name | field_name } | method_name ( argument [ ,...n] ) } ]
                    | expression
                    [ [ AS ] column_alias ]
                  }

#5 column_alias = expression

Now, I need some change to fix (?<!\{[^\}]*)\|(?![^\{]*\}) and clear #4 ....

okey, I Find a pattern, may be it is not perfect but it is work. it like this:

Regex.Split(s, @"(?<!\{(?>[^\{\}]+|\{(?<D>)|\}(?<-D>))*(?(D)(?!)))\|(?!(?>[^\{\}]+|\{(?<D>)|\}(?<-D>))*(?(D)(?!))\})")

Finally, I would like to thank all those who helped me again.

Alex Chen
  • 9
  • 3
  • Try `Regex.Split(s, @"(?m)\s*^\|\s*")` if all the `|` you need to split with are at the start of lines. If there may be `|` at the start of lines that should not be split with, do not use this. – Wiktor Stribiżew Aug 24 '18 at 10:20
  • Do you really need to use regular expressions? – Rui Jarimba Aug 24 '18 at 10:35
  • @WiktorStribiżew Thank you for your help and sorry for my result sample. I want to split with `|`, but when I use `\s*\|\s*` to split, I get too many result, I need to without `|` in `{}` or `[]`, my mean is result just has 4 item like `*`, `{ table_name | view_name | table_alias }.*`, `{ [ { table_name | view_name | table_alias }. ] { column_name | $IDENTITY | $ROWGUID } | udt_column_name [ { . | :: } { { property_name | field_name } | method_name ( argument [ ,...n] ) } ] | expression [ [ AS ] column_alias ] }`, `column_alias = expression` Thanks again – Alex Chen Aug 24 '18 at 14:59
  • I did not suggest `\s*\|\s*`, I suggested `@"(?m)\s*^\|\s*"`. Let know if it helps. Else, you need a parser. – Wiktor Stribiżew Aug 24 '18 at 15:01
  • @RuiJarimba thank you for your help. I really wnat to use regular expressions. – Alex Chen Aug 24 '18 at 15:01
  • @WiktorStribiżew yes, you are right. `\s*\|\s*` is not work. but `@"(?m)\s*^\|\s*"` get me just one item result not four items. thanks again. – Alex Chen Aug 24 '18 at 15:05
  • @WiktorStribiżew yes, I try to making some parser for my self. – Alex Chen Aug 24 '18 at 15:15

1 Answers1

0

Here it goes:

using System.Text.RegularExpressions;

static void Main(string[] args)
{
    string text = @"*
    | { table_name | view_name | table_alias }.*
    | {
        [ { table_name | view_name | table_alias }. ]
        { column_name | $IDENTITY | $ROWGUID }
        | udt_column_name [ { . | :: } { { property_name | field_name } | method_name ( argument [ ,...n] ) } ]
        | expression
        [ [ AS ] column_alias ]
    }
    | column_alias = expression";


    string pattern = BuildPattern();
    RegexOptions options = RegexOptions.Compiled | RegexOptions.Multiline;


    // solution 1: using a MatchEvaluator(Match) delegate
    string normalizedText = Regex.Replace(text, pattern, GetNormalizedLine, options);

    // solution 2: using replacement groups
    string normalizedText2 = Regex.Replace(text, pattern, "$3$4", options);

    bool areEqual = normalizedText2.Equals(normalizedText);

    Console.Read();
}

private static string BuildPattern()
{
    // '|' is special character, needs to be escaped. 
    // Assuming there might be some whitespace after the pipe
    string pipe = @"\|\s*";

    // '{' is special character, needs to be escaped. 
    string bracket = @"\{";

    // remaining text in the line
    string otherText = @".+";

    // using parenthesis () to group the results
    string pattern = $"^(({pipe})({bracket})?)({otherText})$";

    return pattern;
}

private static string GetNormalizedLine(Match match)
{
    GroupCollection groups = match.Groups;

    return $"{groups[3].Value}{groups[4].Value}";
}

Output is the following string:

*
{ table_name | view_name | table_alias }.*
{
    [ { table_name | view_name | table_alias }. ]
    { column_name | $IDENTITY | $ROWGUID }
    | udt_column_name [ { . | :: } { { property_name | field_name } | method_name ( argument [ ,...n] ) } ]
    | expression
    [ [ AS ] column_alias ]
  }
column_alias = expression

EDIT:

I'm not using Regex.Split(), as mentioned by the OP as I don't think it's necessary to remove the | character. To get an array with all the lines (excluding whitespace) is simple:

string[] lines = normalizedText.Split(Environment.NewLine, StringSplitOptions.RemoveEmptyEntries);

Some notes:

  • I'm assuming that the | character to be removed is always at the beginning of the line, i.e. there is no whitespace before that character
  • I'm assuming that there might be some whitespace between characters | and {
  • I'm using parenthesis for grouping the matches (see Regular Expression Groups in C#)
Rui Jarimba
  • 11,166
  • 11
  • 56
  • 86
  • Thank you for your help and sorry for my result sample. I want to split with `|` and not in `{}` or `[]`, my mean is result just has 4 items like `*` `{ table_name | view_name | table_alias }.*` `{ [ { table_name | view_name | table_alias }. ] { column_name | $IDENTITY | $ROWGUID } | udt_column_name [ { . | :: } { { property_name | field_name } | method_name ( argument [ ,...n] ) } ] | expression [ [ AS ] column_alias ] }` `column_alias = expression ` Thanks again – Alex Chen Aug 24 '18 at 16:49
  • @AlexChen please edit your question and explain what is the desidered output – Rui Jarimba Aug 24 '18 at 17:08