2

We are trying to use urls for complex querying and filtering.
I managed to get some of the simpler parst working using expression trees and a mix of regex and string manipulation but then we looked at a more complex string example

 var filterstring="(|(^(categoryid:eq:1,2,3,4)(categoryname:eq:condiments))(description:lk:”*and*”))";

I'd like to be able to parse this out in to parts but also allow it to be recursive.. I'd like to get the out put looking like:

   item[0] (^(categoryid:eq:1,2,3,4)(categoryname:eq:condiments)
   item[1] description:lk:”*and*”

From there I could Strip down the item[0] part to get

categoryid:eq:1,2,3,4
categoryname:eq:condiments

At the minute I'm using RegEx and strings to find the | ^ for knowing if it's an AND or an OR the RegEx matches brackets and works well for a single item it's when we nest the values that I'm struggling.

the Regex looks like

@"\((.*?)\)"

I need some way of using Regex to match the nested brackets and help would be appreciated.

Andy Allison
  • 765
  • 9
  • 19
  • 4
    I think the question is too complicated, it is not really easy to understand what the problem is. E.g. it might be the regex, it might be the ServiceStack, the URL or OData or something else. Try to explain it to a rubberduck. http://www.codinghorror.com/blog/2012/03/rubber-duck-problem-solving.html – Casperah Jul 11 '13 at 09:39
  • 4
    I think it all comes down to matching nested brackets. I know that this is possible in [PHP, perl](http://stackoverflow.com/a/14952740) and .NET. Otherwise, you might just write a small parser, it's not that complex. – HamZa Jul 11 '13 at 09:43
  • I think HamZa is correct it's probably as simple as matching the brackets but it's the nesting that's causing me the problem. – Andy Allison Jul 11 '13 at 09:46
  • @Casperah fair point.. Looking back even thought I had put some effort into making this a somewhere near decent question I failed. I've now realised why I usually don't ask on here. Sorry guys will get rid of the question unless somebody answers quickly – Andy Allison Jul 11 '13 at 09:48
  • 2
    @Andyroo It's a good question. There may be some improvements since the problem is just the brackets and there is superfluous information that has nothing to do with the core of the problem. That said, I'm pretty sure there is a duplicate on SO on how to match/parse nested brackets in C#, searching for it... – HamZa Jul 11 '13 at 09:51
  • 1
    @Andyroo take a look at this [answer](http://stackoverflow.com/a/13279627/). It seems promising. – HamZa Jul 11 '13 at 10:13

3 Answers3

2

You could transform the string into valid XML (just some simple replace, no validation):

var output = filterstring
    .Replace("(","<node>")
    .Replace(")","</node>")
    .Replace("|","<andNode/>")
    .Replace("^","<orNode/>");

Then, you could parse the XML nodes by using, for example, System.Xml.Linq.

XDocument doc = XDocument.Parse(output);

Based on you comment, here's how you rearrange the XML in order to get the wrapping you need:

foreach (var item in doc.Root.Descendants())
{
    if (item.Name == "orNode" || item.Name == "andNode")
    {
        item.ElementsAfterSelf()
            .ToList()
            .ForEach(x =>
            {
                x.Remove();
                item.Add(x);
            });
    }
}

Here's the resulting XML content:

<node>
  <andNode>
    <node>
      <orNode>
        <node>categoryid:eq:1,2,3,4</node>
        <node>categoryname:eq:condiments</node>
      </orNode>
    </node>
    <node>description:lk:”*and*”</node>
  </andNode>
</node>
Alex Filipovici
  • 31,789
  • 6
  • 54
  • 78
  • Thanks Alex that was a good suggestion but I'm still missing the matching of the nested brackets. Might be able to modify it a bit though. – Andy Allison Jul 11 '13 at 10:29
  • 1
    Could you elaborate on _I'm still missing the matching of the nested brackets_? – Alex Filipovici Jul 11 '13 at 10:39
  • Due to the way th () is nested I get instead of the wrapping I would need. – Andy Allison Jul 11 '13 at 11:04
  • 1
    Well, after we have the XML, it's only manipulation. You can always check the first node in the child collection to see if: 1. It's an operator node (`orNode` or `andNode`) and apply the operation on it's siblings. Or you could get the siblings and make them the operator node's children. 2. If it's a non-operator node, continue parsing. – Alex Filipovici Jul 11 '13 at 11:09
1

I understand that you want the values specified in the filterstring.

My solution would be something like this:

NameValueCollection values = new NameValueCollection();
foreach(Match pair in Regex.Matches(@"\((?<name>\w+):(?<operation>\w+):(?<value>[^)]*)\)"))
{
     if (pair.Groups["operation"].Value == "eq")
         values.Add(pair.Groups["name"].Value, pair.Groups["value"].Value);
}

The Regex understand a (name:operation:value), it doesn't care about all the other stuff.

After this code has run you can get the values like this:

values["categoryid"]
values["categoryname"]
values["description"]

I hope this will help you in your quest.

Casperah
  • 4,504
  • 1
  • 19
  • 13
  • Thanks that is almost perfect I just need to tweak it so I can split the groupings up. For example I would need the catagoryId and name in one grouping as they are wrapped and the description in another grouping the | and ^ distinguishes whether it would be AND or OR so they would need retaining somehow to. – Andy Allison Jul 11 '13 at 11:20
0

I think you should just make a proper parser for that — it would actually end up simpler, more extensible and save you time and headaches in the future. You can use any existing parser generator such as Irony or ANTLR.

Andrey Shchekin
  • 21,101
  • 19
  • 94
  • 162