1

I am looking for a php library that can be used to take a string like "happyeaster" or "buyaboat" and return the individual words - "happy" and "easter" or "buy" "a" "boat". Does anyone know of an existing library or something already built that can be downloaded or purchased to do this?

Nick Roskam
  • 415
  • 1
  • 5
  • 7

4 Answers4

0

I ended up taking this scrip http://squarecog.wordpress.com/2008/10/19/splitting-words-joined-into-a-single-string/ and redoing it in PHP. I also accept the first solution with the least amount of words.

Nick Roskam
  • 415
  • 1
  • 5
  • 7
0

php would have no way of knowing which words you are looking for without you telling it first.

so you may need to elaberate a little more on what you are attempting to get a worthwhile answer.

You could perhaps use reg ex and have an array of words to find, or substr.

for instance how would php know that you want the words happy and easter and not east also found within that string?

0

The sound like you need a fulltext search library. Try Lucene and Zend Lucene library. Hope this help.

hungneox
  • 9,333
  • 12
  • 49
  • 66
0
<?php
function binary_search($elem, $array) { 
   $top = sizeof($array) -1; 
   $bot = 0; 

   while($top >= $bot) { 
      $p = floor(($top + $bot) / 2); 
      if ($array[$p] < $elem) 
        $bot = $p + 1; 
      elseif ($array[$p] > $elem) 
        $top = $p - 1; 
      else 
        return TRUE; 
   } 
   return FALSE; 
} 

$handle = @fopen("/usr/share/dict/words", "r");
if ($handle) {
    while (($buffer = fgets($handle, 4096)) !== false) {
        $words[] = trim($buffer);
    }
  fclose($handle);
}

sort($words);

function getmultiplewords($word1, $word2, &$dict){
    if (strlen($word1)==0) return;
    if (binary_search($word1, $dict) && binary_search($word2, $dict)) {
        echo $word2 . " / " . $word1. "\n";
    } 
    $word2 = $word2 . substr($word1,0,1);
    $word1 = substr($word1,1);
    getmultiplewords($word1, $word2, $dict);
}


getmultiplewords("cartalk","", $words);
getmultiplewords("superman","", $words);
?>

Here's a simple solution that looks for 2-splits of words.

It works on linux with /usr/share/dict/words file, otherwise you will have to download the file yourself here:

http://www.freebsd.org/cgi/cvsweb.cgi/src/share/dict/web2?rev=1.12;content-type=text%2Fplain

If you want n wordsplitting that can be done for reasonably sized words also :) Just let me know and I'll look into it.

abcde123483
  • 3,885
  • 4
  • 41
  • 41