3

I'm trying to write a search query to find articles from a database. I would like to take the search string the user enters and look for a specific set of possible search terms. If the user entered the search string "listing of average salaries in germany for 2011" I would like to generate a list of terms to hunt for. I figured I would look for the whole string and for partial strings of consecutive words. That is I want to search for "listing of average salaries" and "germany for 2011" but not "listing germany 2011".

So far I have this bit of code to generate my search terms:

  $searchString = "listing of average salaries in germany for 2011";
  $searchTokens = explode(" ", $searchString);
  $searchTerms = array($searchString);

  $tokenCount = count($searchTokens);
  for($max=$tokenCount - 1; $max>0; $max--) {
      $termA = "";
      $termB = "";
      for ($i=0; $i < $max; $i++) {
          $termA .= $searchTokens[$i] . " ";
          $termB .= $searchTokens[($tokenCount-$max) + $i] . " ";
      }
      array_push($searchTerms, $termA);
      array_push($searchTerms, $termB);
  }

  print_r($searchTerms);

and its giving me this list of terms:

  • listing of average salaries in germany for 2011
  • listing of average salaries in germany for
  • of average salaries in germany for 2011
  • listing of average salaries in germany
  • average salaries in germany for 2011
  • listing of average salaries in
  • salaries in germany for 2011
  • listing of average salaries
  • in germany for 2011
  • listing of average
  • germany for 2011
  • listing of
  • for 2011
  • listing
  • 2011

What I'm not sure how to get are the missing terms:

  • of average salaries in germany for
  • of average salaries in germany
  • average salaries in germany for
  • of average salaries in
  • average salaries in germany
  • salaries in germany for
  • etc...

Update

I'm not looking for a "power set" so answers like this or this aren't valid. For example I do not want these in my list of terms:

  • average germany
  • listing salaries 2011
  • of germany for

I'm looking for consecutive words only.

Community
  • 1
  • 1
Justin808
  • 20,859
  • 46
  • 160
  • 265
  • What you're looking for is called a power set. It's been asked and resolved a few times here on SO already. :) [Here's a question which was resolved with a working function](http://stackoverflow.com/questions/6092781/finding-the-subsets-of-an-array-in-php), and [here's another](http://stackoverflow.com/questions/10834393/php-how-to-get-all-possible-combinations-of-1d-array). – Joel Hinz Jun 20 '13 at 19:11
  • 1
    This approach seems rather inefficient and unnecessarily complex. You should probably be looking at something like Lucene or Sphinx. – Alex Howansky Jun 20 '13 at 19:11

2 Answers2

0

First of all, I just want to let you know that if you are going to run ALL these against an SQL database for a search, it's extremely inefficient, and suggest you use the LIKE option. http://www.techonthenet.com/sql/like.php

Now, to get all the possible combinations, just break up the words into an array (like you've done with explode), and follow the advice given by @ulvund on this question: PHP: How to get all possible combinations of 1D array?

Which is to say

<?php

$array = explode(" ", "listing of average salaries in germany for 2011");

function depth_picker($arr, $temp_string, &$collect) {
    if ($temp_string != "") 
        $collect []= $temp_string;

    for ($i=0; $i<sizeof($arr);$i++) {
        $arrcopy = $arr;
        $elem = array_splice($arrcopy, $i, 1); // removes and returns the i'th element
        if (sizeof($arrcopy) > 0) {
            depth_picker($arrcopy, $temp_string ." " . $elem[0], $collect);
        } else {
            $collect []= $temp_string. " " . $elem[0];
        }   
    }   
}

$collect = array();
depth_picker($array, "", $collect);
print_r($collect);

?>
Community
  • 1
  • 1
Joseph Szymborski
  • 1,241
  • 2
  • 17
  • 31
0

You want to find all sequential subsets of the exploded string, just start at offset=0 and split the array with length=1 up to count-offset:

$search_string = 'listing of average salaries in germany for 2011';
$search_array = explode(' ',$search_string);
$count = count($search_array);

$s = array();
$min_length = 1;

for ($offset=0;$offset<$count;$offset++) {
    for ($length=$min_length;$length<=$count-$offset;$length++) {
        $match = array_slice($search_array,$offset,$length);
        $search_matches []= join(' ',$match);
    }
}

print_r($search_array);
print_r($search_matches);
Ast Derek
  • 2,739
  • 1
  • 20
  • 28