I'm trying to write a search query to find articles from a database. I would like to take the search string the user enters and look for a specific set of possible search terms. If the user entered the search string "listing of average salaries in germany for 2011" I would like to generate a list of terms to hunt for. I figured I would look for the whole string and for partial strings of consecutive words. That is I want to search for "listing of average salaries" and "germany for 2011" but not "listing germany 2011".
So far I have this bit of code to generate my search terms:
$searchString = "listing of average salaries in germany for 2011";
$searchTokens = explode(" ", $searchString);
$searchTerms = array($searchString);
$tokenCount = count($searchTokens);
for($max=$tokenCount - 1; $max>0; $max--) {
$termA = "";
$termB = "";
for ($i=0; $i < $max; $i++) {
$termA .= $searchTokens[$i] . " ";
$termB .= $searchTokens[($tokenCount-$max) + $i] . " ";
}
array_push($searchTerms, $termA);
array_push($searchTerms, $termB);
}
print_r($searchTerms);
and its giving me this list of terms:
- listing of average salaries in germany for 2011
- listing of average salaries in germany for
- of average salaries in germany for 2011
- listing of average salaries in germany
- average salaries in germany for 2011
- listing of average salaries in
- salaries in germany for 2011
- listing of average salaries
- in germany for 2011
- listing of average
- germany for 2011
- listing of
- for 2011
- listing
- 2011
What I'm not sure how to get are the missing terms:
- of average salaries in germany for
- of average salaries in germany
- average salaries in germany for
- of average salaries in
- average salaries in germany
- salaries in germany for
- etc...
Update
I'm not looking for a "power set" so answers like this or this aren't valid. For example I do not want these in my list of terms:
- average germany
- listing salaries 2011
- of germany for
I'm looking for consecutive words only.