0

Finding a good question title is difficult for my question so if you have a better one feel free to edit!

Currently i'm retrieving a page using file_get_contents and then i will strip out all the javascript, set everything to lowercase and strip all the html tags out of it.

After this i'm making an array with every single word like so:

preg_match_all("/((?:\w'|\w|-)+)/", $contents, $words);

$frequency = array();

    foreach($words[0] as $word) {

        unset($words[$word]);

        // This is the filter out the 'common words'
        if(in_array($word, $common_words)) continue;

        if(isset($frequency[$word])) {
            $frequency[$word] += 1;
        } else {
            $frequency[$word] = 1;
        }
    }

But this works for single words, if I were to retrieve a HTML page with this text in it:

'This is a sample text. This is what a HTML text can look like'

This will result in the following using my code:

this = 2 is = 2 a = 2 sample = 1 text = 2 what = 1 html = 1 can = 1 look = 1 like = 1

But now i want something that looks alike, but for 2 words. How would i achive this? It should look something like this using the same sentence:

this is = 2

I tried to give as many examples as i could to make it as clear as possible.

If you need any clarification please do ask!

Déjà vu
  • 774
  • 2
  • 9
  • 31
  • As you intend to form keys using multiple words, I guess there is a need of some dictionary (not the literal one, to be specific. Just an array, file or something) to match against. Do you have one? – Sayed Apr 04 '14 at 09:34
  • Or, you can actually use the preformed (using previous queries in a single run) result to look for matches. That can serve as your dictionary and hence, you might be able to generate keys like `this is = 2` – Sayed Apr 04 '14 at 09:35
  • This function might help some [str_word_count](http://www.php.net/manual/en/function.str-word-count.php) – Class Apr 04 '14 at 09:38

1 Answers1

0

try with str_word_count() and array_count_values()`:

$total_words = array_count_values(str_word_count('your_string', 1));
print_r($total_words);

for more help :- php: sort and count instances of words in a given string

Community
  • 1
  • 1
Rakesh Sharma
  • 13,680
  • 5
  • 37
  • 44