7

I have the following string:

a,b,c,d.e(f,g,h,i(j,k)),l,m,n

Would know tell me how I could build a regex that returns me only the "first level" of parentheses something like this:

[0] = a,b,c,
[1] = d.e(f,g,h,i.j(k,l))
[2] = m,n

The goal would be to keep the section that has the same index in parentheses nested to manipulate future.

Thank you.

EDIT

Trying to improve the example...

Imagine I have this string

username,TB_PEOPLE.fields(FirstName,LastName,TB_PHONE.fields(num_phone1, num_phone2)),password

My goal is to turn a string into a dynamic query. Then the fields that do not begin with "TB_" I know they are fields of the main table, otherwise I know informandos fields within parentheses, are related to another table. But I am having difficulty retrieving all fields "first level" since I can separate them from related tables, I could go recursively recovering the remaining fields.

In the end, would have something like:

[0] = username,password
[1] = TB_PEOPLE.fields(FirstName,LastName,TB_PHONE.fields(num_phone1, num_phone2))

I hope I have explained a little better, sorry.

Verner
  • 107
  • 2
  • 6

3 Answers3

12

You can use this:

(?>\w+\.)?\w+\((?>\((?<DEPTH>)|\)(?<-DEPTH>)|[^()]+)*\)(?(DEPTH)(?!))|\w+

With your example you obtain:

0 => username
1 => TB_PEOPLE.fields(FirstName,LastName,TB_PHONE.fields(num_phone1, num_phone2))
2 => password

Explanation:

(?>\w+\.)? \w+ \(    # the opening parenthesis (with the function name)
(?>                  # open an atomic group
    \(  (?<DEPTH>)   # when an opening parenthesis is encountered,
                     #  then increment the stack named DEPTH
  |                  # OR
    \) (?<-DEPTH>)   # when a closing parenthesis is encountered,
                     #  then decrement the stack named DEPTH
  |                  # OR
    [^()]+           # content that is not parenthesis
)*                   # close the atomic group, repeat zero or more times
\)                   # the closing parenthesis
(?(DEPTH)(?!))       # conditional: if the stack named DEPTH is not empty
                     #  then fail (ie: parenthesis are not balanced)

You can try it with this code:

string input = "username,TB_PEOPLE.fields(FirstName,LastName,TB_PHONE.fields(num_phone1, num_phone2)),password";
string pattern = @"(?>\w+\.)?\w+\((?>\((?<DEPTH>)|\)(?<-DEPTH>)|[^()]+)*\)(?(DEPTH)(?!))|\w+";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
    Console.WriteLine(match.Groups[0].Value);
}
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • Hi! I'm trying to apply the regex exactly as you put it, but the return I'm having is this: [0] => "" [1] => "," [2] => ",", [3] => "" Could tell me what I'm forgetting to do? Thank you. – Verner Oct 28 '13 at 11:31
  • You might be better off using a nested quantified group inside of your atomic one to prevent backtracking and speed up recognition of a failed match. I.e. `\( (?> (?: \( (?) | \) (?<-DEPTH>) | [^()]+ )* ) \)`. However, if you don't care about failure performance, then it's not necessary. (spaces there for readability only) – Adrian Jan 22 '18 at 23:50
0

If I understood correctly your example, your are looking for something like this:

(?<head>[a-zA-Z._]+\,)*(?<body>[a-zA-Z._]+[(].*[)])(?<tail>.*)

For given string:

username,TB_PEOPLE.fields(FirstName,LastName,TB_PHONE.fields(num_phone1, num_phone2)),password

This expression will match

  • username, for group head
  • TB_PEOPLE.fields(FirstName,LastName,TB_PHONE.fields(num_phone1, num_phone2)) for group body
  • ,password for group tail
Denis Itskovich
  • 4,383
  • 3
  • 32
  • 53
0

I suggest a new strategy, R2 - do it algorithmically. While you can build a Regex that will eventually come close to what you're asking, it'll be grossly unmaintainable, and hard to extend when you find new edge cases. I don't speak C#, but this pseudo code should get you on the right track:

function parenthetical_depth(some_string):
    open = count '(' in some_string
    close = count ')' in some_string
    return open - close

function smart_split(some_string):
    bits = split some_string on ','
    new_bits = empty list
    bit = empty string
    while bits has next:
        bit = fetch next from bits
        while parenthetical_depth(bit) != 0:
            bit = bit + ',' + fetch next from bits
        place bit into new_bits
    return new_bits

This is the easiest way to understand it, the algorithm is currently O(n^2) - there's an optimization for the inner loop to make it O(n) (with the exception of String copying, which is kind of the worst part of this):

depth = parenthetical_depth(bit)
while depth != 0:
    nbit = fetch next from bits
    depth = depth + parenthetical_depth(nbit)
    bit = bit + ',' + nbit

The string copying can be made more efficient with clever use of buffers and buffer size, at the cost of space efficiency, but I don't think C# gives you that level of control natively.

FrankieTheKneeMan
  • 6,645
  • 2
  • 26
  • 37