0

I am using a function created by Vitim.us for counting all occurrences of a substring.

The function(linked above) goes like this:

/** Function that count occurrences of a substring in a string;
 * @param {String} string               The string
 * @param {String} subString            The sub string to search for
 * @param {Boolean} [allowOverlapping]  Optional. (Default:false)
 *
 * @author Vitim.us https://gist.github.com/victornpb/7736865
 * @see Unit Test https://jsfiddle.net/Victornpb/5axuh96u/
 * @see https://stackoverflow.com/a/7924240/938822
 */
function occurrences(string, subString, allowOverlapping) {

    string += "";
    subString += "";
    if (subString.length <= 0) return (string.length + 1);

    var n = 0,
        pos = 0,
        step = allowOverlapping ? 1 : subString.length;

    while (true) {
        pos = string.indexOf(subString, pos);
        if (pos >= 0) {
            ++n;
            pos += step;
        } else break;
    }
    return n;
}

I have an index of words (containing tags of stemmed words and the original content). To improve speed, I thought of finding if the word exists in the tags and then counting the occurrences if required.

To count if the word exists, I make use of

s.indexOf(word)

When comparing a single indexOf call with the occurrences function which calls indexOf multiple times, I found that the occurrences function took less time consistently.

  1. How is this possible?

This is the exact code and string I used for benchmarking - code

  1. This might be a separate question...If this is the case, then what is the use of creating an index with stemmed words? I can directly find the occurrences from the content(which is a faster way).
Parth Kapadia
  • 507
  • 6
  • 18
  • 1
    Remove the `console.log` from your two functions and then run your performance test. Codepen wraps `console.log` so that it can update its own DOM console, so it needs to update the DOM each time it calls which impacts performance and can throw off your results. Once removed you should see #2 is slower. Not too sure what you mean by "creating an index with stemmed words" though. – Nick Parsons Aug 24 '22 at 11:38
  • 1
    @NickParsons Thanks for the comment. I removed the `console.log` and received the correct as well as the expected results. Now that a single `indexOf` works faster, checking if the word exists in the tagged content makes sense before counting occurrences. I am indexing files to be searched later on. The index is created by breaking the sentences into stemmed (stemming - eating, eaten, eat becomes eat) words/tags. So I first search if the word exists in the tags and if yes, only then count the occurrences. – Parth Kapadia Aug 24 '22 at 11:46

1 Answers1

0

As NickParsons pointed out in the comments, a single indexOf is NOT taking more time than the occurrences function. I was getting incorrect results due to how codepen handles console.log()- The codepen's DOM is updated each time a console.log takes place which slows down the code.

When removed the console.log() and tested the function, I got correct and expected results. A single indexOf was running faster than the occurrences function. You can find the updated code here

Parth Kapadia
  • 507
  • 6
  • 18