1

I'm trying to find all the unique whole words from a body of text. Currently this is what I am using but it doesn't seem to be working:

$textDump = "cat dog monkey cat snake horse"
$wholeWord = "/[\w]*/";
$uniqueWords = (preg_match($wholeWord, $textDump, $matches));

Any help would be appreciated. Thanks!

Dachan
  • 157
  • 4
  • 16
  • 1
    You want to use [`preg_match_all`](http://php.net/preg_match_all). And the result is thrown into the third variable, `$matches` – mario Feb 11 '13 at 17:54
  • you're not capturing anything. try `(\w*)`. Note: no need to use a character class (`[]`) for just a single "character". That's redundant. – Marc B Feb 11 '13 at 17:55
  • possible duplicate of [PHP preg\_match to find multiple occurrences](http://stackoverflow.com/questions/2029976/php-preg-match-to-find-multiple-occurrences) – mario Feb 11 '13 at 17:56

4 Answers4

6
array_unique(
    str_word_count($textDump,1)
);
Mark Baker
  • 209,507
  • 32
  • 346
  • 385
  • Is this really what you wanted? It finds all **distinct** words, so "cat dog cat", becomes `[cat,dog]`. It does not find **unique** words, i.e. `[dog]` from "cat dog cat" – Fabian Schmengler Feb 11 '13 at 18:32
2

You can use str_word_count

$textDump = "cat dog monkey cat snake horse";
$uniqueWords = (str_word_count($textDump, 1);
Tchoupi
  • 14,560
  • 5
  • 37
  • 71
1

Why not achieve this using explode(); and array_unique(); in this case?

$text = "cat dog monkey cat snake horse";

$foo = explode(" ", $text);
print_r(array_unique($foo)); 
samayo
  • 16,163
  • 12
  • 91
  • 106
  • It seems you have the same misunderstanding as I do. The question asks for word that appear once in the string, not removing duplicate. – nhahtdh Feb 11 '13 at 18:08
  • Wouldn't using explode cause issues with punctuation though, since 'hello,' and 'hello' would both register as unique. – Supericy Feb 11 '13 at 18:09
  • I didn't get the exact question,:) – samayo Feb 11 '13 at 18:11
1

The answers given so far all assume, that with "find all the unique whole words" you really meant "remove duplicates". Actually your question is not very clear about it, as you don't specify what your desired output is in your example, but I'll take you at your word and provide a solution for "find all the unique whole words".

This means, for the input:

"cat dog monkey cat snake horse"

You will get the output:

"dog monkey snake horse"

Actually, str_word_count is useful for this too, together with array_count_values, which actually counts the different values:

$wordCount = array_count_values(str_word_count($textDump,1));

$wordCount is now:

array(5) {
  ["cat"]    => int(2)
  ["dog"]    => int(1)
  ["monkey"] => int(1)
  ["snake"]  => int(1)
  ["horse"]  => int(1)
}

Next, remove the words with a word count higher than 1 (note, that the actual words are the array keys, so we use array_keys to get them:

$uniqueWords = array_keys(
    array_filter(
        $wordCount,
        function($count) {
            return $count === 1;
        }
    )
);

$uniqueWords is now:

array(4) {
  [0] => string(3) "dog"
  [1] => string(6) "monkey"
  [2] => string(5) "snake"
  [3] => string(5) "horse"
}

Complete code:

$textDump = "cat dog monkey cat snake horse";
$wordCount = array_count_values(str_word_count($textDump,1));
$uniqueWords = array_keys(
    array_filter(
        $wordCount,
        function($count) {
            return $count === 1;
        }
    )
);
echo join(' ', $uniqueWords);
//dog monkey snake horse
Fabian Schmengler
  • 24,155
  • 9
  • 79
  • 111