1

I am trying to parse a Boolean equation that is currently in the form of an NSString.

I want to parse it into a tree so that I can manipulate and simplify the expression into its simplest form.

Like Wolfram Alpha is able to do.

http://www.wolframalpha.com/input/?i=%28A+and+%28A+or+B%29+or+%28B+and+A%29+or+not%28A%29%29+and+%28B+or+not+%28A%29%29

Simplifies the input:

(A and (A or B) or (B and A) or not(A)) and (B or not (A))

to

Not(a) or B

My problem is parsing the equation into a tree object where each tree node has 3 properties:

1.TreeNode *parent

2.NSMutableArray *children

3.NSString *data

thanks

Nag
  • 1,438
  • 1
  • 11
  • 39
Tawfiqh
  • 143
  • 1
  • 10

3 Answers3

3

To parse a string into a tree (AST), you need two components: a lexer, which splits the string into individual "tokens" - braces, operators, identifiers in your case, and a parser, that consumes tokens one by one from the lexer and builds the tree. For the lexer you're probably going to use NSScanner, the parser for your grammar is easy to write by hand (see for example http://en.wikipedia.org/wiki/Recursive_descent_parser), or you can use a tool like yacc or Lemon.

georg
  • 211,518
  • 52
  • 313
  • 390
  • +1 if you can guarantee the string is well-formed, writing an RD parser could be largely trivial. – Dave DeLong Jan 31 '12 at 15:58
  • Thanks for that, i've looked at the wikipedia entry but still quite hazy on how exactly to write the parser, any more tips? – Tawfiqh Feb 01 '12 at 19:13
  • @Tawfiqh: use a parser generator that creates the parser for you. I'd suggest [lemon](http://www.hwaci.com/sw/lemon/lemon.html) because of its simplicity. – georg Feb 01 '12 at 20:35
  • I looked at Lemon but it looked very complex, it was easier for me to just write my own parser which splits the equation first by OR then by AND and recursively deals with brackets. – Tawfiqh Feb 08 '12 at 17:04
1

My math parser (DDMathParser) should be able to handle this with a little modification:

#import "DDMathParser.h"

NSString *source = @"(A and (A or B) or (B and A) or not(A)) and (B or not (A))";
source = [source stringByReplacingOccurrencesOfString:@" and " withString:@" && "];
source = [source stringByReplacingOccurrencesOfString:@" or " withString:@" || "];

NSError *error = nil;
DDExpression *e = [DDExpression expressionFromString:source error:&error];
if (e) {
  // it successfully parsed
}

As for simplifying the expression... DDMathParser does rudimentary expression rewriting, which is fully explained in this wiki page on the DDMathParser repository. I'm not sure if there are any rewrite rules for logical expressions (applying DeMorgan's law, etc), but those wouldn't be hard to add.

Regarding your requirements:

  1. Every DDExpression node has a parentExpression readonly property
  2. You can access the sub-expressions of a DDExpression node via the arguments property
  3. Due to a decision in how DDMathParser parses strings, A and B will actually be parsed as A() and B() (i.e., functions that take no parameters). If you want them to be parsed as "variable" expressions, they'd need a $ in front: $A, etc. This just means that you can access the name of the thing by using the function property, as opposed to the variable property.
Community
  • 1
  • 1
Dave DeLong
  • 242,470
  • 58
  • 448
  • 498
0

Ok so thanks for the help this is the final Objective- C code i wrote to parse a Boolean expression into a tree:

it takes an expression such as

A AND B OR C AND NOT(B)

in the form:

A.B + C./b

It works with brackets parsing with precedence:

-(TreeNode *)ParseStringIntoTree:(NSString *)InputString{ //input string to parse          
//returns root-node of tree
TreeNode *first=[[TreeNode alloc] init];
TreeNode *current=first;
NSString *workingString = [NSString stringWithString:InputString];

if (([workingString characterAtIndex:0]=='(') && ([workingString characterAtIndex:workingString.length-1]==')')) {
    NSRange boop={1,workingString.length-2};
    workingString=[workingString substringWithRange:boop];
}
int brackCount=0; 
bool plussesLeft=FALSE;
for (int pos=0; pos<workingString.length; pos++) {
    char currentC=[workingString characterAtIndex:pos];
    //1
    if (currentC=='(') {
        brackCount++;
    }
    //2
    if (currentC==')') {
        brackCount--;
    } 
    if (currentC=='+' && brackCount==0){
        plussesLeft=TRUE;
    }

}
//############ PARSE plus signs with  BRACKETS
brackCount=0;
int prevPlusPos=-1;
if (plussesLeft) {
    for (int pos=0; pos<workingString.length; pos++) {
        char currentC=[workingString characterAtIndex:pos];

        //1
        if (currentC=='(') {
            brackCount++;
        }
        //2
        if (currentC==')') {
            brackCount--;
        }    

        //3
        if (currentC=='+'&&brackCount==0) {
            NSRange boop={prevPlusPos+1, pos-prevPlusPos-1};
            NSString *toParse=[workingString substringWithRange:boop];
            TreeNode *child;

            if(toParse.length>1){child=[self ParseStringIntoTree:toParse];}

            else{child=[TreeNode newTreeNodeWithValue:toParse];}
            [current addChild:child];
            [current setValue:@"+"];
            prevPlusPos=pos;
        }

        //4
        if (pos==workingString.length-1 &&brackCount==0 && prevPlusPos!=-1) {
            NSRange boop={prevPlusPos+1, pos-prevPlusPos};

            NSString *toParse=[workingString substringWithRange:boop];

            TreeNode *child;
            if(toParse.length>1){child=[self ParseStringIntoTree:toParse];}
            else{child=[TreeNode newTreeNodeWithValue:toParse];};

            [current addChild:child];
            [current setValue:@"+"];
        }
    }
}
//############ finish PARSE plus signs with  BRACKETS

BOOL dotsLeft=FALSE;
for (int pos=0; pos<workingString.length; pos++) {
    char currentC=[workingString characterAtIndex:pos];
    //1
    if (currentC=='(') {
        brackCount++;
    }
    //2
    if (currentC==')') {
        brackCount--;
    } 
    if (currentC=='.' && brackCount==0){
        dotsLeft=TRUE;
    }

}
int prevDotPos=-1;
if (!plussesLeft && dotsLeft) {
    for (int pos=0; pos<workingString.length; pos++) {
        char currentC=[workingString characterAtIndex:pos];

        //1
        if (currentC=='(') {
            brackCount++;
        }
        //2
        if (currentC==')') {
            brackCount--;
        }    

        //3
        if (currentC=='.' && brackCount==0 && prevPlusPos==-1) {
            NSRange boop={prevDotPos+1, pos-prevDotPos-1};
            NSString *toParse=[workingString substringWithRange:boop];

            TreeNode *child;

            if(toParse.length>1){child=[self ParseStringIntoTree:toParse];}

            else{child=[TreeNode newTreeNodeWithValue:toParse];}            
            [current addChild:child];
            [current setValue:@"."];
            prevDotPos=pos;
        }

        //4
        if (pos==workingString.length-1 &&brackCount==0 && prevDotPos!=-1) {
            NSRange boop={prevDotPos+1, pos-prevDotPos};

            NSString *toParse=[workingString substringWithRange:boop];

            TreeNode *child;
            if(toParse.length>1){child=[self ParseStringIntoTree:toParse];}
            else{child=[TreeNode newTreeNodeWithValue:toParse];};

            [current addChild:child];
            [current setValue:@"."];        
        }
    }



    //left with current being the 

}

if (!plussesLeft && !dotsLeft) {
    if ([workingString characterAtIndex:0]=='/') {
        TreeNode *child=[self ParseStringIntoTree:[workingString substringFromIndex:1]];
        [current addChild:child];
        [current setValue:@"/"];
    }
    if (workingString.length==1) {
        [current setValue:workingString];
    }
}

return first;

}

Where the treeNode object has properties and methods:

@interface TreeNode : NSObject{
    NSMutableArray *children;
    TreeNode *parent;
    NSString *value;
}

+(TreeNode *)newTreeNodeWithValue:(NSString *)Value;
-(void)addChild:(TreeNode *)child;

The methods do as implied by there names. Hope this can help anyone else in future looking for a parser either specifically for Boolean algebra or maybe as a basis for a different parser.

Tawfiqh
  • 143
  • 1
  • 10