9

I would like to take a and/or logic query query string of unknown length:

$logic = 'elephants and tigers or dolphins and apes or monkeys and humans and gorillas and and 133322 or 2';

And parse it into an array, I assume would look something like:

$parsed_to_or = array(
  array('elephants', 'tigers'),
  array('dolphins', 'apes'),
  array('monkeys', 'humans', 'gorillas', '133322'),
  array('2')
 );

This is what I have so far:

 $logic_e = preg_split('/\s+/', $logic); 
 $or_segments = array();
 $and_group = array();  
 foreach($logic_e as $fragment) {
  if (preg_match('/^(and|&&)$/i', $fragment)) {
   continue;
  } elseif (preg_match('/^(or|\\|\\|)$/i', $fragment)) {
   if (count($and_group)>0) {
    $or_segments[] = $and_group;
    $and_group = array();
   } continue;
  } else {
   $and_group[] = $fragment;
   continue;
  }
 } 
 if (count($and_group)>0) {
  $or_segments[] = $and_group;
  $and_group = array();
 }

Any better ways to tackle this?

user2362840
  • 183
  • 7
  • 2
    without having brackets a boolean *language* would not make much sense. – hek2mgl May 08 '13 at 15:00
  • 2
    Wouldn't it be easier to `explode()` it on 'or', and then explode each fragment on 'and'? Admittedly, that would give you an empty array towards the end of your sample string, where you have "and and", but you can check for that. – andrewsi May 08 '13 at 15:03
  • is the "and and" towards the end a typo? – STT LCU May 08 '13 at 15:05
  • and and's as well as or or's should be handled, since this is coming from user input. – user2362840 May 08 '13 at 15:37
  • You can build a recursive descent parser to do this very easily. See http://stackoverflow.com/questions/2245962/is-there-an-alternative-for-flex-bison-that-is-usable-on-8-bit-embedded-systems/2336769#2336769 – Ira Baxter May 08 '13 at 17:54

5 Answers5

4

Update: Added the ability to use && and || anywhere

You can do the following:

<?php

$logic = 'elephants && tigers || dolphins && apes || monkeys and humans and gorillas and && 133322 or 2';

$result = array();
foreach (preg_split('/ (or|\|\|) /', $logic) as $parts) {
  $bits = preg_split('/ (and|&&) /', $parts);
  for ($x=0; $x<count($bits); $x++) {
    $bits[$x] = preg_replace('/\s?(and|&&)\s?/', '', $bits[$x]);
  }
  $result[] = $bits;
}

echo '<pre>';
var_dump($result);

Which would result in the following:

array(4) {
  [0]=>
  array(2) {
    [0]=>
    string(9) "elephants"
    [1]=>
    string(6) "tigers"
  }
  [1]=>
  array(2) {
    [0]=>
    string(8) "dolphins"
    [1]=>
    string(4) "apes"
  }
  [2]=>
  array(4) {
    [0]=>
    string(7) "monkeys"
    [1]=>
    string(6) "humans"
    [2]=>
    string(8) "gorillas"
    [3]=>
    string(6) "133322"
  }
  [3]=>
  array(1) {
    [0]=>
    string(1) "2"
  }
}
LeonardChallis
  • 7,759
  • 6
  • 45
  • 76
3

This will handle the gorillas problem, and the empty entries such as and and

$logic = 'elephants and tigers or dolphins and apes || monkeys and humans and gorillas and and 133322 or 2';

$arrayBlocks = preg_split('/(\bor\b|\|\|)/', $logic);
array_walk(
    $arrayBlocks,
    function(&$entry, $key) {
        $entry = preg_split('/(\band\b|&&)/', $entry);
        $entry = array_filter(
            array_map(
                'trim',
                $entry
            )
        );
    }
);

var_dump($arrayBlocks);

though array_filter will also clean a 0 entry

Mark Baker
  • 209,507
  • 32
  • 346
  • 385
2

How about this:

$logic = 'elephants and tigers or dolphins and apes or monkeys and humans and gorillas and and 133322 or 2';
$ors = preg_split('/(\bor\b|\s\|\|\s)/', $logic);

foreach ($ors as &$or) {
    $or = array_filter(array_map('trim', preg_split('/(\band\b|\s&&\s)/', $or)));
}

var_dump($ors);
cmbuckley
  • 40,217
  • 9
  • 77
  • 91
  • Interesting, the last entry is a reference. – user2362840 May 08 '13 at 15:26
  • This one shares the problem with Mark Baker's answer, cleaning 0 entries (they are essentially the same answer), but it is also a little "too simple". For example if I wanted to handle manual grouping "x and (y or z) and a". A definite win in the elegance and efficiency categories. – user2362840 May 08 '13 at 15:42
  • @user2362840 - If you want to handle braces, then you use a proper lexer rather than simplistic regexps... but you didn't indicate that at all in your original question. If that's a requirement, then the answer is a lot more complex – Mark Baker May 08 '13 at 15:46
  • @MarkBaker - you are correct. This is my first question on stackoverflow, I should have been more thorough in my requirements. You and cbuckley's answers are usable, but I think Leonard Challis' answer is more what I was looking for. – user2362840 May 08 '13 at 16:16
  • Fully agree with Mark's comment - beyond the original scope, you're probably looking at a much more complicated lexer. If the 0 is a problem, then you can pass a custom callback to array_filter to deal with that. Regarding the trailing reference, adding an `unset($or)` after the `foreach` will deal with that. – cmbuckley May 08 '13 at 16:18
  • And although Leonard's answer doesn't suffer from the g **OR** illas problem, it does suffer from the p **AND** as problem. – cmbuckley May 08 '13 at 16:27
  • 1
    gORillas && pANDas :) I'm going to be coming up with animal names containing AND or OR all evening now.... that's where preg's \b comes in useful – Mark Baker May 08 '13 at 16:30
1

Using explode is much more simple:

    $logic = 'elephants and tigers or dolphins and apes or monkeys and humans and gorillas and and 133322 or 2';

    $parts = explode(" or ", $logic);

    foreach($parts as $part){
        if(!empty($part)){
            $finalArray[] = explode(" and ", $part);

        }
    }

    print_r($finalArray);

That would return:

Array
(
    [0] => Array
        (
            [0] => elephants
            [1] => tigers
        )

    [1] => Array
        (
            [0] => dolphins
            [1] => apes
        )

    [2] => Array
        (
            [0] => monkeys
            [1] => humans
            [2] => gorillas
            [3] => and 133322
        )

    [3] => Array
        (
            [0] => 2
        )

)
Alvaro
  • 40,778
  • 30
  • 164
  • 336
1

What I'm thinking I'll go with:

$or_segments = array();
foreach(preg_split('/((\\s+or\\s+|\\s*\\|\\|\\s*))+/i', $logic) as $or_split_f) {        
 $or_segments[] = preg_split('/((\\s+and\\s+|\\s*&&\\s*))+/i', $or_split_f);
} 
var_dump($or_segments);
user2362840
  • 183
  • 7