Finding a good question title is difficult for my question so if you have a better one feel free to edit!
Currently i'm retrieving a page using file_get_contents
and then i will strip out all the javascript, set everything to lowercase and strip all the html tags out of it.
After this i'm making an array with every single word like so:
preg_match_all("/((?:\w'|\w|-)+)/", $contents, $words);
$frequency = array();
foreach($words[0] as $word) {
unset($words[$word]);
// This is the filter out the 'common words'
if(in_array($word, $common_words)) continue;
if(isset($frequency[$word])) {
$frequency[$word] += 1;
} else {
$frequency[$word] = 1;
}
}
But this works for single words, if I were to retrieve a HTML page with this text in it:
'This is a sample text. This is what a HTML text can look like'
This will result in the following using my code:
this = 2
is = 2
a = 2
sample = 1
text = 2
what = 1
html = 1
can = 1
look = 1
like = 1
But now i want something that looks alike, but for 2 words. How would i achive this? It should look something like this using the same sentence:
this is = 2
I tried to give as many examples as i could to make it as clear as possible.
If you need any clarification please do ask!