20

I have a large block of text, and I would like to find out the most common words being used (except for a few, like "the", "a", "and", etc).

How would I go about searching this block of text for its most commonly used words?

A-Sharabiani
  • 17,750
  • 17
  • 113
  • 128
j.s
  • 247
  • 1
  • 2
  • 5

7 Answers7

30

You should split the string into words, then loop through the words and increment a counter for each one:

var wordCounts = { };
var words = str.split(/\b/);

for(var i = 0; i < words.length; i++)
    wordCounts["_" + words[i]] = (wordCounts["_" + words[i]] || 0) + 1;

The "_" + allows it to process words like constructor that are already properties of the object.

You may want to write words[i].toLowerCase() to count case-insensitively.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • 3
    Just out of curiosity -- did you have this snippet laying around somewhere, or did you come up with the solution just for this answer? Either way, it's awesome. :) – Daniel Szabo Jul 03 '11 at 20:33
  • @ajax: I created it on the spot. Thanks! – SLaks Jul 03 '11 at 20:35
  • Hey thanks a lot, I was just wondering, could you explain the /\b/ argument? That's a regular expression yes? – j.s Jul 03 '11 at 20:49
  • 1
    Yes. It matches a word boundary - the break between two words or a word and a non-word. – SLaks Jul 03 '11 at 20:57
  • 1
    i'm getting _Never: NaN not sure why i'm getting NaN ? – mcgrailm Nov 29 '11 at 04:19
  • @Aaron: The problem was that I never initialized the counts. Fixed. – SLaks Jan 31 '12 at 01:39
  • Old topic I know, but, I'm relatively amateur to JavaScript and I was wondering how can I access the counts? Like what do I do to find out what the most common word was? I've printed the log of the wordCounts to the console and see its an object, but how can I then determine the most spoken word? – David C Apr 28 '13 at 22:02
  • @DavidC799: Use a `for in` loop. – SLaks Apr 29 '13 at 13:22
4

I started with Gustavo Maloste's suggestion and added filtering for sticky words.

let str = 'Delhi is a crowded city. There are very few rich people who travel by their own vehicles. The majority of the people cannot afford to hire a taxi or a three-wheeler. They have to depend on D.T.C. buses, which are the cheapest mode of conveyance. D.T.C. buses are like blood capillaries of our body spreading all over in Delhi. One day I had to go to railway station to receive my uncle. I had to reach there by 9.30 a.m. knowing the irregularity of D.T.C. bus service; I left my home at 7.30 a.m. and reached the bus stop. There was a long queue. Everybody was waiting for the bus but the buses were passing one after another without stopping. I kept waiting for about an hour. I was feeling very restless and I was afraid that I might not be able to reach the station in time. It was 8.45. Luckily a bus stopped just in front of me. It was overcrowded but somehow I managed to get into the bus. Some passengers were hanging on the footboard, so there was no question of getting a seat. It was very uncomfortable. We were feeling suffocated. All of a sudden, an old man declared that his pocket had been picked. He accused the man standing beside him. The young man took a knife out of his pocket and waved it in the air. No body dared to catch him. I thanked God when the bus stopped at the railway station. I reached there just in time.';
//console.log(findMostRepeatedWord(str)); // Result: "do"

let occur = nthMostCommon(str, 10);

console.log(occur);

function nthMostCommon(str, amount) {

  const stickyWords =[
    "the",
    "there",
    "by",
    "at",
    "and",
    "so",
    "if",
    "than",
    "but",
    "about",
    "in",
    "on",
    "the",
    "was",
    "for",
    "that",
    "said",
    "a",
    "or",
    "of",
    "to",
    "there",
    "will",
    "be",
    "what",
    "get",
    "go",
    "think",
    "just",
    "every",
    "are",
    "it",
    "were",
    "had",
    "i",
    "very",
    ];
    str= str.toLowerCase();
    var splitUp = str.split(/\s/);
    const wordsArray = splitUp.filter(function(x){
    return !stickyWords.includes(x) ;
            });
    var wordOccurrences = {}
    for (var i = 0; i < wordsArray.length; i++) {
        wordOccurrences['_'+wordsArray[i]] = ( wordOccurrences['_'+wordsArray[i]] || 0 ) + 1;
    }
    var result = Object.keys(wordOccurrences).reduce(function(acc, currentKey) {
        /* you may want to include a binary search here */
        for (var i = 0; i < amount; i++) {
            if (!acc[i]) {
                acc[i] = { word: currentKey.slice(1, currentKey.length), occurences: wordOccurrences[currentKey] };
                break;
            } else if (acc[i].occurences < wordOccurrences[currentKey]) {
                acc.splice(i, 0, { word: currentKey.slice(1, currentKey.length), occurences: wordOccurrences[currentKey] });
                if (acc.length > amount)
                    acc.pop();
                break;
            }
        }
        return acc;
    }, []);
 
    return result;
    }
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Daniel Lefebvre
  • 364
  • 4
  • 11
2

Here is my approach

  • First, separate the words from the string using Regular Expression.
  • Declare an object as a Map which will help you to find the occurrences of each word. (You can use Map Data Structure!)
  • Find the most repeated word from that object.

let str = 'How do you do?';
console.log(findMostRepeatedWord(str)); // Result: "do"

function findMostRepeatedWord(str) {
  let words = str.match(/\w+/g);
  console.log(words); // [ 'How', 'do', 'you', 'do' ]

  let occurances = {};

  for (let word of words) {
    if (occurances[word]) {
      occurances[word]++;
    } else {
      occurances[word] = 1;
    }
  }

  console.log(occurances); // { How: 1, do: 2, you: 1 }

  let max = 0;
  let mostRepeatedWord = '';

  for (let word of words) {
    if (occurances[word] > max) {
      max = occurances[word];
      mostRepeatedWord = word;
    }
  }

  return mostRepeatedWord;
}
1

Coming from the future, where this question was asked again, but I started too early with the solution and it was marked as answered. Anyway, it's a complement of the answer of SLaks.

function nthMostCommon(string, amount) {
    var wordsArray = string.split(/\s/);
    var wordOccurrences = {}
    for (var i = 0; i < wordsArray.length; i++) {
        wordOccurrences['_'+wordsArray[i]] = ( wordOccurrences['_'+wordsArray[i]] || 0 ) + 1;
    }
    var result = Object.keys(wordOccurrences).reduce(function(acc, currentKey) {
        /* you may want to include a binary search here */
        for (var i = 0; i < amount; i++) {
            if (!acc[i]) {
                acc[i] = { word: currentKey.slice(1, currentKey.length), occurences: wordOccurrences[currentKey] };
                break;
            } else if (acc[i].occurences < wordOccurrences[currentKey]) {
                acc.splice(i, 0, { word: currentKey.slice(1, currentKey.length), occurences: wordOccurrences[currentKey] });
                if (acc.length > amount)
                    acc.pop();
                break;
            }
        }
        return acc;
    }, []);
    return result;
}
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
1

by this function, you can have a list of most frequent words. this function returns an array.

findMostFrequentWords = (string) => {
var wordsArray = string.split(/\s/);
var wordOccurrences = []
for (var i = 0; i < wordsArray.length; i++) {
    wordOccurrences[wordsArray[i]] = (wordOccurrences[wordsArray[i]] || 0) + 1;
}
const maximum = Object.keys(wordOccurrences).reduce(function (accomulated, current) {
    return wordOccurrences[current] >= wordOccurrences[accomulated] ? current : accomulated;
});
const result = []
Object.keys(wordOccurrences).map((word) => {
    if (wordOccurrences[word] === wordOccurrences[maximum])
        result.push(word);
})
return result
}
Mohammad
  • 11
  • 1
0

Lodash 1-liner:

const mostFrequentWord = _.maxBy(Object.values(_.groupBy(str.match(/\b(\w+)\b/g))), w => w.length)[0]
ricka
  • 1,107
  • 1
  • 11
  • 13
0

try this function:

function fun(str){
    let words = str.split(" ")
    let uniqeWords = words.filter((word,i)=>word[i])
    let resultCount = 0
    let result = ''

    for (let i in uniqeWords){
        let count = 0
        for(let j in words) if(words[j] == uniqeWords[i]) count++;
        if(resultCount < count){
            resultCount = count
            result = uniqeWords[i]
        }
    }
    return result
}
Hassan Naghibi
  • 101
  • 1
  • 5