Well the problem is not that simple you may think; there are many issues should be taken care of such as punctuation, letter case, and things like how word boundaries are identified.
However using N_Gram concept I provide the following solution:
1- Identify how many words are in the key. Name it as N
2- Extract all N-consecutive sequence of words (N_Grams) in the text.
3- Count the occurrence of key in N_Grams
string text = "I have asked the question in StackOverflow. Therefore i can expect answer here.";
string key = "the question";
int gram = key.Split(' ').Count();
var parts = text.Split(' ');
List<string> n_grams = new List<string>();
for (int i = 0; i < parts.Count(); i++)
{
if (i <= parts.Count() - gram)
{
string sequence = "";
for (int j = 0; j < gram; j++)
{
sequence += parts[i + j] + " ";
}
if (sequence.Length > 0)
sequence = sequence.Remove(sequenc.Count() - 1, 1);
n_grams.Add(sequence);
}
}
// The result
int count = n_grams.Count(p => p == key);
}
For example for the key = the question
and considering single space
as word boundaries, the following bi-grams are extracted:
I have
have asked
asked the
the question
question in
in StackOverflow.
StackOverflow. Therefore
Therefore i
i can
can expect
expect answer
answer here.
and the number of times the question
appears in the text is not obvious to see: 1