I am using a function created by Vitim.us for counting all occurrences of a substring.
The function(linked above) goes like this:
/** Function that count occurrences of a substring in a string;
* @param {String} string The string
* @param {String} subString The sub string to search for
* @param {Boolean} [allowOverlapping] Optional. (Default:false)
*
* @author Vitim.us https://gist.github.com/victornpb/7736865
* @see Unit Test https://jsfiddle.net/Victornpb/5axuh96u/
* @see https://stackoverflow.com/a/7924240/938822
*/
function occurrences(string, subString, allowOverlapping) {
string += "";
subString += "";
if (subString.length <= 0) return (string.length + 1);
var n = 0,
pos = 0,
step = allowOverlapping ? 1 : subString.length;
while (true) {
pos = string.indexOf(subString, pos);
if (pos >= 0) {
++n;
pos += step;
} else break;
}
return n;
}
I have an index of words (containing tags of stemmed words and the original content). To improve speed, I thought of finding if the word exists in the tags and then counting the occurrences if required.
To count if the word exists, I make use of
s.indexOf(word)
When comparing a single indexOf
call with the occurrences
function which calls indexOf
multiple times, I found that the occurrences
function took less time consistently.
- How is this possible?
This is the exact code and string I used for benchmarking - code
- This might be a separate question...If this is the case, then what is the use of creating an index with stemmed words? I can directly find the occurrences from the content(which is a faster way).