I am looking for a php library that can be used to take a string like "happyeaster" or "buyaboat" and return the individual words - "happy" and "easter" or "buy" "a" "boat". Does anyone know of an existing library or something already built that can be downloaded or purchased to do this?
-
Do you only want it to work for 2 words or to work for *n* words? – abcde123483 Nov 28 '11 at 15:50
-
2what if "buy" "ab" "oat" came back? – Brandon Henry Nov 28 '11 at 15:51
-
it sounds like that library will return lot of errors :D takeaway = take a way || take away ? – luso Nov 28 '11 at 15:51
-
If you explain your case may you find other solutions... – luso Nov 28 '11 at 15:53
-
2check out this question http://stackoverflow.com/questions/195010/how-can-i-split-multiple-joined-words i think it fits what you are looking for... – misterjinx Nov 28 '11 at 15:55
4 Answers
I ended up taking this scrip http://squarecog.wordpress.com/2008/10/19/splitting-words-joined-into-a-single-string/ and redoing it in PHP. I also accept the first solution with the least amount of words.

- 415
- 1
- 5
- 7
-
3Since you've asked for a php library and you've marked your question as answered, I believe it would be fair to post the code. – Andrea Sciamanna Jul 14 '13 at 08:00
php would have no way of knowing which words you are looking for without you telling it first.
so you may need to elaberate a little more on what you are attempting to get a worthwhile answer.
You could perhaps use reg ex and have an array of words to find, or substr.
for instance how would php know that you want the words happy and easter and not east also found within that string?

- 307
- 3
- 11
The sound like you need a fulltext search library. Try Lucene and Zend Lucene library. Hope this help.

- 9,333
- 12
- 49
- 66
<?php
function binary_search($elem, $array) {
$top = sizeof($array) -1;
$bot = 0;
while($top >= $bot) {
$p = floor(($top + $bot) / 2);
if ($array[$p] < $elem)
$bot = $p + 1;
elseif ($array[$p] > $elem)
$top = $p - 1;
else
return TRUE;
}
return FALSE;
}
$handle = @fopen("/usr/share/dict/words", "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
$words[] = trim($buffer);
}
fclose($handle);
}
sort($words);
function getmultiplewords($word1, $word2, &$dict){
if (strlen($word1)==0) return;
if (binary_search($word1, $dict) && binary_search($word2, $dict)) {
echo $word2 . " / " . $word1. "\n";
}
$word2 = $word2 . substr($word1,0,1);
$word1 = substr($word1,1);
getmultiplewords($word1, $word2, $dict);
}
getmultiplewords("cartalk","", $words);
getmultiplewords("superman","", $words);
?>
Here's a simple solution that looks for 2-splits of words.
It works on linux with /usr/share/dict/words file, otherwise you will have to download the file yourself here:
http://www.freebsd.org/cgi/cvsweb.cgi/src/share/dict/web2?rev=1.12;content-type=text%2Fplain
If you want n wordsplitting that can be done for reasonably sized words also :) Just let me know and I'll look into it.

- 3,885
- 4
- 41
- 41