Javascript: How to count number of instances of a phrase in a larger string?

Question

I need to create a javacript function that downloads the html source code of a web page and returns the number of times a CSS class is mentioned.

var str = document.body.innerHTML;
function getFrequency(str) {
    var freq = {};
    for (var i=0; i<string.length;i++) {
        var css_class = "ENTER CLASS HERE";
        if (freq[css_class]) {
           freq[css_class]++;
        } else {
           freq[css_class] = 1;
        }
    }

    return freq;
};

What am I doing wrong here?

Why do you think you are doing something wrong? Is your code not working as expected? What do you expect and what does actually happen? — Felix Kling, Jun 05 '17 at 18:07
It can by any html code. For example, if I wanted to count how many posts are on the front page of reddit, I would set the html source of reddit.com to a variable and then count how many times "title may-blank" shows up. — Alex, Jun 05 '17 at 18:08
Possible duplicate of [How to count string occurrence in string?](https://stackoverflow.com/questions/4009756/how-to-count-string-occurrence-in-string) — Mike, Jun 05 '17 at 18:08
I don't understand why this was reopened. The duplicated question, which has been mentioned again, perfectly answers this. — Mike Cluck, Jun 05 '17 at 18:09
Are you actually calling `getFrequency()` somewhere or are you only defining it? It's unclear from the source you posted. — J. Titus, Jun 05 '17 at 18:11
Its unclear whether you want to get classname count of current document(thats why you are using document,body.innerHTML) or some other webpage(as mentioned in question) — Priyesh Kumar, Jun 05 '17 at 18:16
@MikeC The question is not about finding occurrences of a substring in a string, that's why the question got re-openend. — Tomalak, Jun 05 '17 at 18:18
@Tomalak Except that it is if the question is to be taken literally. Just because there's another approach they *could* take assuming they aren't "download[ing] the source code" as part of the program doesn't mean it isn't a duplicate. If they're just asking how to get the number of elements with a class in the current page then [there's a different duplicate target.](https://stackoverflow.com/questions/210377/get-all-elements-in-an-html-document-with-a-specific-css-class) — Mike Cluck, Jun 05 '17 at 18:21
@MikeC That's a much better duplicate target indeed. The original duplicate target was missing the point. — Tomalak, Jun 05 '17 at 18:23
@Tomalak You're still making an assumption. It's possible OP didn't mean to write that they were downloading the source code for some page. If they didn't mean that, then what I just posted is correct. If they meant what they said but just used `document.body.innerHTML` to create a functional setup, the original duplicate was correct. Either way, it should definitely be closed instead of answered. — Mike Cluck, Jun 05 '17 at 18:24
My answer covers both angles, so I guess I'm in the clear here. Besides, whether the HTML was downloaded or not is a technicality that has no effect on the proper approach. HTML needs to be parsed, one way or the other. Where you got the HTML from makes no difference at all. — Tomalak, Jun 05 '17 at 18:27
@Tomalak But it does make a difference. What if they don't want to affect their existing DOM? What if the class is unique enough to not require an actual DOM search? There's also a very famous question/answer for dealing with HTML via regular expressions which would answer this question. — Mike Cluck, Jun 05 '17 at 18:29
Huh? Where do I recommend regular expressions in my answer? (Spoiler, I don't. I explicitly dis-recommend them.) My answer also does not change the actual DOM one bit. — Tomalak, Jun 05 '17 at 18:32
@Tomalak I didn't say you did. The answer to the duplicate question provides a solution using regular expressions. Since yours does not change the DOM and does not create a DOM of it's own, it does **not** handle the case where the source code has been downloaded at run-time and has to be assessed that way, which is what the wording of the question implies. Pay attention here, I'm not just attacking you for the sake of it. I'm saying that your answer is at best incomplete and at most inaccurate based on OPs intent. It has also already been answered before in both cases. — Mike Cluck, Jun 05 '17 at 18:33
I'm sorry, what? My answer provides a way of dealing with HTML downloaded at run-time. — Tomalak, Jun 05 '17 at 18:35
@Tomalak You're right, I glanced over the last bit. Sorry about that. However, you should still close this question as a duplicate. You have the ability to do so. As I've stated, this question already exists and the question should be closed as a duplicate. Since you presumably reopened it and have since discovered it *is* a duplicate, you should close it as such. I don't have the ability to fix this anymore since you reverted my closing. — Mike Cluck, Jun 05 '17 at 18:41
Since none of the answers in the other thread covers how to easily *parse* incoming HTML, I don't really think this is an exact duplicate. I could probably go and find an exact duplicate, there are probably dozens. There are more duplicate questions asked and answered every day than I care to count. Frankly, I don't see how downvoting working, sensible answers and then getting into an argument over it contributes to fixing the problem. And to be absolutely honest, I'm not really sure if it's a (fixable) problem at all. — Tomalak, Jun 05 '17 at 18:54
@Tomalak Well, that's your opinion. I try my best to create consistent quality by marking duplicates. Whether downvoting your answer is correct could be debated but it's the action I stand by as a way of penalized, what I think, is poor behavior. You have the tools and presumably the trust of the community that you'll try to upkeep the quality of the site. I think lazily answering unclear questions rather than finding the correct duplicate contributes to poor quality. So I stand by my decision. — Mike Cluck, Jun 05 '17 at 19:16
@Tomalak [Here](https://stackoverflow.com/questions/10585029/parse-a-html-string-with-js), I did the legwork for you. That's how to parse an HTML string into a useful DOM. If OP needs to do that first, there's the answer for it. To get the actual number of elements, [there's this question here.](https://stackoverflow.com/questions/210377/get-all-elements-in-an-html-document-with-a-specific-css-class) Since we disagree on what OP wanted in the first place, we can't say if the first step is needed or not. Either way, the second part is definitively the correct answer to the problem. — Mike Cluck, Jun 05 '17 at 19:20
I appreciate that you try and I can see where you are coming from. I disagree about the "lowering the quality" bit, but that point is moot. When I notice questions that I know or can easily find duplicates for (easily as in "1st try on Google; within the top 5 hits") then I will close as a duplicate as well. But I will certainly not go out of my way to find something that could possibly qualify as a duplicate. It's bad enough that I have to play Google proxy for so many people, why would I want to be a more efficient one. — Tomalak, Jun 05 '17 at 19:26
@Tomalak I understand where you're coming from but I still disagree with you. I think we both screwed up by not determining exactly what OP meant in the first place before doing anything. In either case, I still think that the duplication target I chose is correct and it's worth marking them as such. If this question hits the top 5 results of Google then it would be helpful to have it essentially redirect to a question addressing the core issue. — Mike Cluck, Jun 05 '17 at 19:29

Tomalak · Answer 1 · 2017-06-08T16:42:27.730

What am I doing wrong here?

I hate to say it, but fundamentally... everything. Getting information about HTML does not involve string functions or regular expressions. HTML cannot be dealt with this way, its rules are way too complex.

HTML needs to be parsed by an HTML parser.

In the browser there are two possible scenarios:

If you work with the current document (as you seem to do), then the parsing is already done by the browser.

Counting the number of times a CSS class is used actually is the same thing as finding out how many HTML elements have that class. And that is easily done via document.querySelectorAll() and a CSS selector.
```
var elements = document.querySelectorAll(".my-css-class");
alert("There are " + elements.length + " occurrences of the class.");
```

If you have an HTML string that you loaded from somewhere, you need to parse it first. In JavaScript you can make the browser parse the HTML for you very easily:

var html = '<div class="my-css-class">some random HTML</div>';
var div = document.createElement("div");
div.innerHTML = html; // parsing happens here

Now you can employ the same strategy as above, only with div as your selector context:

var elements = div.querySelectorAll(".my-css-class");
alert("There are " + elements.length + " occurrences of the class.");

Please, no downvotes without explanation. If I'm wrong I'd like to learn where. — Tomalak, Jun 05 '17 at 18:25
I downvoted because the question is a duplicate and should be closed as such. Answering duplicate questions is [in bad taste.](https://meta.stackoverflow.com/questions/309814/is-it-illegal-to-answer-duplicates) — Mike Cluck, Jun 05 '17 at 18:27
Uh that's actually a bad reason to downvote an answer. I also can see no community consensus there that answers to questions that you can find a duplicate for ought to be downvoted. — Tomalak, Jun 05 '17 at 18:30
Maybe not community consensus but the hover text says we should downvote if the answer is not useful. I don't find it useful since this information already exists. — Mike Cluck, Jun 05 '17 at 18:31

Javascript: How to count number of instances of a phrase in a larger string?

1 Answers1