word frequency in javascript

Question

enter image description here

How can I implement javascript function to calculate frequency of each word in a given sentence.

this is my code:

function search () {
  var data = document.getElementById('txt').value;
  var temp = data;
  var words = new Array();
  words = temp.split(" ");
  var uniqueWords = new Array();
  var count = new Array();


  for (var i = 0; i < words.length; i++) {
    //var count=0;
    var f = 0;
    for (j = 0; j < uniqueWords.length; j++) {
      if (words[i] == uniqueWords[j]) {
        count[j] = count[j] + 1;
        //uniqueWords[j]=words[i];
        f = 1;
      }
    }
    if (f == 0) {
      count[i] = 1;
      uniqueWords[i] = words[i];
    }
    console.log("count of " + uniqueWords[i] + " - " + count[i]);
  }
}

am unable to trace out the problem ..any help is greatly appriciated. output in this format: count of is - 1 count of the - 2..

input: this is anil is kum the anil

How are we supposed to know there is even a problem with this code? — zerkms, Jun 18 '15 at 05:07
Use literals: `var words = []` instead of `var words = new Array()` — royhowie, Jun 18 '15 at 06:05
Here's the accepted answer, except as a regular function, and [compressed](https://javascript-minifier.com/): `function wordCounts(n){return n.match(/\w+/g).reduce(function(n,r){return n.hasOwnProperty(r)?++n[r]:n[r]=1,n},{})}` — ashleedawg, Dec 08 '20 at 12:11

Cymen · Answer 1 · 2016-11-19T05:30:28.167

24

Here is a JavaScript function to get the frequency of each word in a sentence:

function wordFreq(string) {
    var words = string.replace(/[.]/g, '').split(/\s/);
    var freqMap = {};
    words.forEach(function(w) {
        if (!freqMap[w]) {
            freqMap[w] = 0;
        }
        freqMap[w] += 1;
    });

    return freqMap;
}

It will return a hash of word to word count. So for example, if we run it like so:

console.log(wordFreq("I am the big the big bull."));
> Object {I: 1, am: 1, the: 2, big: 2, bull: 1}

You can iterate over the words with Object.keys(result).sort().forEach(result) {...}. So we could hook that up like so:

var freq = wordFreq("I am the big the big bull.");
Object.keys(freq).sort().forEach(function(word) {
    console.log("count of " + word + " is " + freq[word]);
});

Which would output:

count of I is 1
count of am is 1
count of big is 2
count of bull is 1
count of the is 2

JSFiddle: http://jsfiddle.net/ah6wsbs6/

And here is wordFreq function in ES6:

function wordFreq(string) {
  return string.replace(/[.]/g, '')
    .split(/\s/)
    .reduce((map, word) =>
      Object.assign(map, {
        [word]: (map[word])
          ? map[word] + 1
          : 1,
      }),
      {}
    );
}

JSFiddle: http://jsfiddle.net/r1Lo79us/

edited Nov 19 '16 at 05:30

answered Jun 18 '15 at 05:44

Cymen

14,079
4
52
72

Many thanks for the solution . i am trying to understand your code and i will try to implement it . but can u please point out the mistake in my code? – Anil Jun 18 '15 at 05:56
1

First mistake is trying to count works by putting things in array. It is much easier to count using a hash. In JavaScript, an Object or `{}` is just like a hash so make use of the easiest to use things. I'll look. – Cymen Jun 18 '15 at 05:58
@KalalAnil Yeah, I tried but I would end up rewriting it to same as above. – Cymen Jun 18 '15 at 06:01
Yeah i have used two arrays just to be clear and because i did not know much about map object. i will try to look into Objects concept .. thank you – Anil Jun 18 '15 at 06:05
1

It is much much harder using multiple arrays. You can do it but it is way too much work. Good idea to learn about hashes first. – Cymen Jun 18 '15 at 06:06

Sampson · Accepted Answer · 2015-06-18T06:24:19.000

19

I feel you have over-complicated things by having multiple arrays, strings, and engaging in frequent (and hard to follow) context-switching between loops, and nested loops.

Below is the approach I would encourage you to consider taking. I've inlined comments to explain each step along the way. If any of this is unclear, please let me know in the comments and I'll revisit to improve clarity.

(function () {

    /* Below is a regular expression that finds alphanumeric characters
       Next is a string that could easily be replaced with a reference to a form control
       Lastly, we have an array that will hold any words matching our pattern */
    var pattern = /\w+/g,
        string = "I I am am am yes yes.",
        matchedWords = string.match( pattern );

    /* The Array.prototype.reduce method assists us in producing a single value from an
       array. In this case, we're going to use it to output an object with results. */
    var counts = matchedWords.reduce(function ( stats, word ) {

        /* `stats` is the object that we'll be building up over time.
           `word` is each individual entry in the `matchedWords` array */
        if ( stats.hasOwnProperty( word ) ) {
            /* `stats` already has an entry for the current `word`.
               As a result, let's increment the count for that `word`. */
            stats[ word ] = stats[ word ] + 1;
        } else {
            /* `stats` does not yet have an entry for the current `word`.
               As a result, let's add a new entry, and set count to 1. */
            stats[ word ] = 1;
        }

        /* Because we are building up `stats` over numerous iterations,
           we need to return it for the next pass to modify it. */
        return stats;

    }, {} );

    /* Now that `counts` has our object, we can log it. */
    console.log( counts );

}());

edited Jun 18 '15 at 06:24

answered Jun 18 '15 at 05:10

Sampson

265,109
74
539
565

1

Why put all logic in return statememt when its not even being returned. I think it makes code harder to read, manage, and understand. – Muhammad Umer Jun 18 '15 at 05:17
3

@MuhammadUmer Because I like the aesthetics of using a single line; and it's short enough that you can see `words` is being returned ultimately. If you prefer two lines; use two lines ;) – Sampson Jun 18 '15 at 05:18
1

I understand that but writing code should be writtwn so you or other person can easily understand it later on like six months. One liners are good when they improve the understanding by not causing distraction from the main logic... but here it's easy to miss that last comma ... if it's just read it reads weird "return abc, d" only d is returned – Muhammad Umer Jun 18 '15 at 05:22
1

@MuhammadUmer I understand, and agree in principle. – Sampson Jun 18 '15 at 05:24
@jonathan Thanks for the solution , but am new to javascript and it takes time to understand ur code. can u please help me in finding out the mistake in my code, because i took lot of time to write the above code . – Anil Jun 18 '15 at 05:24
1

@KalalAnil The approach you took includes punctuation with words, requires multiple arrays, and too much context-switching. Rather than trying to get that approach to work better, I'd encourage you to rethink the problem entirely. – Sampson Jun 18 '15 at 05:57
@JonathanSampson Thanks for your suggestion. i will try to understand your code and implement it . thanks once again. – Anil Jun 18 '15 at 06:09
@MuhammadUmer I've rewritten the answer, taking your advice to heart. – Sampson Jun 18 '15 at 06:40
well explained .. thanks for making it so clear and easy to understand. thanks a lot for your efforts..!! – Anil Jun 18 '15 at 06:40
Please note: reduce is not supported in IE 8 and below. – Wessam El Mahdy Aug 21 '16 at 21:58
@WessamElMahdy Correct, but MDN has [a polyfill](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/Reduce?redirectlocale=en-US&redirectslug=JavaScript%2FReference%2FGlobal_Objects%2FArray%2FReduce#Polyfill). – Sampson Aug 22 '16 at 02:53
@Sampson What happens when the text content is "I I am am am yes yes constructor ." Is there a non-tedious way of dealing with this in JS? – lilinjn Aug 02 '18 at 11:42
@Sampson how would you set a filter to only return words which appear more than or less than a certain amount of times only? Thanks! – Andrew Aug 09 '18 at 16:39
I would change the pattern to `-?\w[\w'’-]*` to detect more words (contraptions, compound adjectives, "ol'", etc.). – thdoan Sep 12 '18 at 18:05
Here's the accepted answer, except as a regular function, and [compressed](https://javascript-minifier.com/): `function wordCounts(n){return n.match(/\w+/g).reduce(function(n,r){return n.hasOwnProperty(r)?++n[r]:n[r]=1,n},{})}` – ashleedawg Dec 08 '20 at 12:08

score 3 · Answer 3 · answered May 15 '20 at 08:13

const sentence = 'Hi my friend how are you my friend';

const countWords = (sentence) => {
    const convertToObject = sentence.split(" ").map( (i, k) => {
        return {
          element: {
              word: i,
              nr: sentence.split(" ").filter(j => j === i).length + ' occurrence',
          }

      }
  });
    return Array.from(new Set(convertToObject.map(JSON.stringify))).map(JSON.parse)
};

console.log(countWords(sentence));

score 0 · Answer 4 · answered Jun 18 '15 at 06:08

Here is an updated version of your own code...

<!DOCTYPE html>
<html>
<head>
<title>string frequency</title>
<style type="text/css">
#text{
    width:250px;
}
</style>
</head>

<body >

<textarea id="txt" cols="25" rows="3" placeholder="add your text here">   </textarea></br>
<button type="button" onclick="search()">search</button>

    <script >

        function search()
        {
            var data=document.getElementById('txt').value;
            var temp=data;
            var words=new Array();
            words=temp.split(" ");

            var unique = {};


            for (var i = 0; i < words.length; i++) {
                var word = words[i];
                console.log(word);

                if (word in unique)
                {
                    console.log("word found");
                    var count  = unique[word];
                    count ++;
                    unique[word]=count;
                }
                else
                {
                    console.log("word NOT found");
                    unique[word]=1;
                }
            }
            console.log(unique);
        }

    </script>

</body>

I think your loop was overly complicated. Also, trying to produce the final count while still doing your first pass over the array of words is bound to fail because you can't test for uniqueness until you have checked each word in the array.

Instead of all your counters, I've used a Javascript object to work as an associative array, so we can store each unique word, and the count of how many times it occurs.

Then, once we exit the loop, we can see the final result.

Also, this solution uses no regex ;)

I'll also add that it's very hard to count words just based on spaces. In this code, "one, two, one" will results in "one," and "one" as being different, unique words.

this works fine and easy to understand for a newbie like me .. thanks a lot . — Anil, Jun 18 '15 at 06:29
By splitting on spaces, *"This world is your world."* would treat *"world"* and *"world."* as two different words. Also, JavaScript doesn't have *associative arrays*, so don't expect objects to behave like them. The `in` operator is dangerous to use here as it includes properties on the prototype chain. So if your string has words like "length" in them, you'll get misleading results. Lastly, "no regex" is not necessarily a good or redeeming quality of a solution. Regular Expressions are a powerful utility that strengthen any developer who invests the time to understand them :) — Sampson, Jun 18 '15 at 06:43
hi @JonathanSampson . I do agree that regex are powerful. It's just that the OP wanted a fix for their code, and rather than offer a completely different solution based on regex, I tried to modify the original code. I believe regex also has a "word" operator that might also solve the "words break on white space" issue, so regex could be a good solution to this problem. As for the "in" operator, I will have to look into that. I didn't know it was problematic. — Lucien Stals, Jun 18 '15 at 06:49

score 0 · Answer 5 · answered Jun 18 '15 at 06:09

While both of the answers here are correct maybe are better but none of them address OP's question (what is wrong with the his code).

The problem with OP's code is here:

if(f==0){
    count[i]=1;
    uniqueWords[i]=words[i];
}

On every new word (unique word) the code adds it to uniqueWords at index at which the word was in words. Hence there are gaps in uniqueWords array. This is the reason for some undefined values.

Try printing uniqueWords. It should give something like:

["this", "is", "anil", 4: "kum", 5: "the"]

Note there no element for index 3.

Also the printing of final count should be after processing all the words in the words array.

Here's corrected version:

function search()
{
    var data=document.getElementById('txt').value;
    var temp=data;
    var words=new Array();
    words=temp.split(" ");
    var uniqueWords=new Array();
    var count=new Array();


    for (var i = 0; i < words.length; i++) {
        //var count=0;
        var f=0;
        for(j=0;j<uniqueWords.length;j++){
            if(words[i]==uniqueWords[j]){
                count[j]=count[j]+1;
                //uniqueWords[j]=words[i];
                f=1;
            }
        }
        if(f==0){
            count[i]=1;
            uniqueWords[i]=words[i];
        }
    }
    for ( i = 0; i < uniqueWords.length; i++) {
        if (typeof uniqueWords[i] !== 'undefined')
            console.log("count of "+uniqueWords[i]+" - "+count[i]);       
    }
}

I have just moved the printing of count out of the processing loop into a new loop and added a if not undefined check.

Fiddle: https://jsfiddle.net/cdLgaq3a/

score 0 · Answer 6 · answered Sep 20 '22 at 13:21

I had a similar assignment. This is what I did:

Assignment : Clean the following text and find the most frequent word (hint, use replace and regular expressions).

const sentence = '%I $am@% a %tea@cher%, &and& I lo%#ve %te@a@ching%;. The@re $is no@th@ing; &as& mo@re rewarding as educa@ting &and& @emp%o@weri@ng peo@ple. ;I found tea@ching m%o@re interesting tha@n any ot#her %jo@bs. %Do@es thi%s mo@tiv#ate yo@u to be a tea@cher!? %Th#is 30#Days&OfJavaScript &is al@so $the $resu@lt of &love& of tea&ching'

console.log(`\n\n 03.Clean the following text and find the most frequent word (hint, use replace and regular expressions) \n\n ${sentence} \n\n`)

console.log(`Cleared sentence : ${sentence.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()@]/g, "")}`)

console.log(mostFrequentWord(sentence))


function mostFrequentWord(sentence) {
  sentence = sentence.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()@]/g, "").trim().toLowerCase()
  let sentenceArray = sentence.split(" ")
  let word = null
  let count = 0
  for (i = 0; i < sentenceArray.length; i++) {
    word = sentenceArray[i]
    count = sentence.match(RegExp(sentenceArray[i], 'gi')).length
    if (count > count) {
      count = count
      word = word
    }
  }
  return `\n Count of most frequent word "${word}" is ${count}`
}

score -1 · Answer 7 · answered Sep 12 '18 at 20:31

I'd go with Sampson's match-reduce method for slightly better efficiency. Here's a modified version of it that is more production-ready. It's not perfect, but it should cover the vast majority of scenarios (i.e., "good enough").

function calcWordFreq(s) {
  // Normalize
  s = s.toLowerCase();
  // Strip quotes and brackets
  s = s.replace(/["“”(\[{}\])]|\B['‘]([^'’]+)['’]/g, '$1');
  // Strip dashes and ellipses
  s = s.replace(/[‒–—―…]|--|\.\.\./g, ' ');
  // Strip punctuation marks
  s = s.replace(/[!?;:.,]\B/g, '');
  return s.match(/\S+/g).reduce(function(oFreq, sWord) {
    if (oFreq.hasOwnProperty(sWord)) ++oFreq[sWord];
    else oFreq[sWord] = 1;
    return oFreq;
  }, {});
}

calcWordFreq('A ‘bad’, “BAD” wolf-man...a good ol\' spook -- I\'m frightened!') returns

{
  "a": 2
  "bad": 2
  "frightened": 1
  "good": 1
  "i'm": 1
  "ol'": 1
  "spook": 1
  "wolf-man": 1
}

word frequency in javascript

7 Answers7

Linked