3

I am building a Sprache parser to parse expressions similar to SQL search conditions. e.g Property = 123 or Property > AnotherProperty

So far both of those examples work, however I am struggling to figure out what I need to do to allow ANDing/ORing conditions and parenthesis.

Basically I have this so far:

private static readonly Parser<string> Operators =
    Parse.String("+").Or(Parse.String("-")).Or(Parse.String("="))
        .Or(Parse.String("<")).Or(Parse.String(">"))
        .Or(Parse.String("<=")).Or(Parse.String(">=")).Or(Parse.String("<>"))
        .Text();

private static readonly Parser<IdentifierExpression> Identifier = 
    from first in Parse.Letter.Once()
    from rest in Parse.LetterOrDigit.Many()
    select new IdentifierExpression(first.Concat(rest).ToArray());

public static readonly Parser<Expression> Integer =
    Parse.Number.Select(n => new IntegerExpression {Value = int.Parse(n)});

public static readonly Parser<SearchCondition> SearchCondition = 
    from left in Identifier.Or(Number)
    from op in Operators.Token()
    from right in Identifier.Or(Number)
    select new SearchCondition { Left = left, Right = right, Operator = op};

This works for the simple cases above, but now I need a pointer on how to implement conditions like:

  • PropertyX = PropertyY OR PropertyX = PropertyZ
  • PropertyA > PropertyB AND (OtherAnotherProperty = 72 OR OtherAnotherProperty = 150)

Can anyone give me an idea on how to structure the parsers for this sort of thing?

Simon
  • 5,373
  • 1
  • 34
  • 46

1 Answers1

8

What you have so far is a basic comparison expression parser. It looks like you want to wrap that in a parser that handles logical expressions (and, or, etc.) with sub-expression support.

The code I posted at first was ripped from poorly-tested code I was still working on, which didn't handle statements with multiple terms. My understanding of the ChainOperator method was clearly incomplete.

Parse.ChainOperator is the method that lets you specify your operators and have them appear 0-to-many times in the expression. I was making assumptions about how it worked that turned out to be just wrong.

I've rewritten the code and added a few bits to make it easier to use:

// Helpers to make access simpler
public static class Condition
{
    // For testing, will fail all variable references
    public static Expression<Func<object, bool>> Parse(string text)
        => ConditionParser<object>.ParseCondition(text);

    public static Expression<Func<T, bool>> Parse<T>(string text)
        => ConditionParser<T>.ParseCondition(text);

    public static Expression<Func<T, bool>> Parse<T>(string text, T instance)
        => ConditionParser<T>.ParseCondition(text);
}

public static class ConditionParser<T>
{
    static ParameterExpression Parm = Expression.Parameter(typeof(T), "_");

    public static Expression<Func<T, bool>> ParseCondition(string text)
        => Lambda.Parse(text);

    static Parser<Expression<Func<T, bool>>> Lambda =>
        OrTerm.End().Select(body => Expression.Lambda<Func<T, bool>>(body, Parm));

    // lowest priority first
    static Parser<Expression> OrTerm =>
        Parse.ChainOperator(OpOr, AndTerm, Expression.MakeBinary);

    static Parser<ExpressionType> OpOr = MakeOperator("or", ExpressionType.OrElse);

    static Parser<Expression> AndTerm =>
        Parse.ChainOperator(OpAnd, NegateTerm, Expression.MakeBinary);

    static Parser<ExpressionType> OpAnd = MakeOperator("and", ExpressionType.AndAlso);

    static Parser<Expression> NegateTerm =>
        NegatedFactor
        .Or(Factor);

    static Parser<Expression> NegatedFactor =>
        from negate in Parse.IgnoreCase("not").Token()
        from expr in Factor
        select Expression.Not(expr);

    static Parser<Expression> Factor =>
        SubExpression
        .Or(BooleanLiteral)
        .Or(BooleanVariable);

    static Parser<Expression> SubExpression =>
        from lparen in Parse.Char('(').Token()
        from expr in OrTerm
        from rparen in Parse.Char(')').Token()
        select expr;

    static Parser<Expression> BooleanValue =>
        BooleanLiteral
        .Or(BooleanVariable);

    static Parser<Expression> BooleanLiteral =>
        Parse.IgnoreCase("true").Or(Parse.IgnoreCase("false"))
        .Text().Token()
        .Select(value => Expression.Constant(bool.Parse(value)));

    static Parser<Expression> BooleanVariable =>
        Parse.Regex(@"[A-Za-z_][A-Za-z_\d]*").Token()
        .Select(name => VariableAccess<bool>(name));

    static Expression VariableAccess<TTarget>(string name)
    {
        MemberInfo mi = typeof(T).GetMember(name, MemberTypes.Field | MemberTypes.Property, BindingFlags.Instance | BindingFlags.Public).FirstOrDefault();
        var targetType = typeof(TTarget);
        var type = 
            (mi is FieldInfo fi) ? fi.FieldType : 
            (mi is PropertyInfo pi) ? pi.PropertyType : 
            throw new ParseException($"Variable '{name}' not found.");

        if (type != targetType)
            throw new ParseException($"Variable '{name}' is type '{type.Name}', expected '{targetType.Name}'");

        return Expression.MakeMemberAccess(Parm, mi);
    }

    // Helper: define an operator parser
    static Parser<ExpressionType> MakeOperator(string token, ExpressionType type)
        => Parse.IgnoreCase(token).Token().Return(type);
}

And some examples:

static class Program
{
    static void Main()
    {
        // Parser with no input
        var condition1 = Condition.Parse("true and false or true");
        Console.WriteLine(condition1.ToString());
        var fn1 = condition1.Compile();
        Console.WriteLine("\t={0}", fn1(null));

        // Parser with record input
        var record1 = new { a = true, b = false };
        var record2 = new { a = false, b = true };
        var condition2 = Condition.Parse("a and b or not a", record);
        Console.WriteLine(condition2.ToString());
        var fn2 = condition2.Compile();
        Console.WriteLine("\t{0} => {1}", record1.ToString(), fn2(record1));
        Console.WriteLine("\t{0} => {1}", record2.ToString(), fn2(record2));
    }
}

You will still need to add your own parsers for handling comparison expressions and so on. Plug those into the BooleanValue parser after the existing terms:

static Parser<Expression> BooleanValue => 
    BooleanLiteral
    .Or(BooleanVariable)
    .Or(SearchCondition);

I'm doing something similar with a more C#-style filter specification with type checking during the parse phase and separate parsers for strings vs numbers.

Corey
  • 15,524
  • 2
  • 35
  • 68
  • Thanks @Corey... unless I am missing something this will not parse expressions like `true and false or true` - it will only handle two expressions (obviously two is fine for true/false but I would want to handle any number of conditions (e.g `Prop1 = A AND Prop2 = B AND Prop3 = C`) – Simon Aug 08 '18 at 18:39
  • Well damn. I thought I tested that. Will fix and update the answer. – Corey Aug 08 '18 at 23:02
  • Done. Incidentally this is based on the structure used in SimpleCalc. I just changed some things to handle boolean conditions instead of arithmetic. At some point I'll post the stuff I'm working on to GitHub. – Corey Aug 09 '18 at 06:08
  • Well, thank you Sir. That is very good. I also found this https://github.com/anpv/SpracheTest/ which has a similar approach to you - now I just need to figure out how it all works :) Thanks very much – Simon Aug 09 '18 at 18:30
  • Fairly sure the recursive magic happens in the `Parse.ChainOperator` code. I haven't read the Sprache code enough to figure out how it works however. – Corey Aug 09 '18 at 23:40
  • _is the method that lets you specify your operators and have them appear 0-to-many times in the expression._ - well, its more like _appear at at least once or many times_ - parser `Parse.ChainOperator(Parse.Char('+'), Parse.Digit().Many(), (op, a, b) => ...)).Parse("1");` (note that + appears zero times ;) ) will not work – Jan 'splite' K. Mar 08 '21 at 10:51