0

I'm writing a function to find the 10 most common words in a string. However, when I go to sort my arr it repeats some of the words for their values of count.

paragraph = `I love teaching. If you do not love teaching what else can you love. I love Python if you do not love something which can give you all the capabilities to develop an application what else can you love.`;
const tenMostFrequentWords = (str) => {
    const regex = /\b[a-z]+\b/gi;
    const arr = str.match(regex);
    const set = new Set();
    for (word of arr) {
        const filteredArr = arr.filter(item => item == word);
        set.add({word: word, count: filteredArr.length});
    }
    const newArr = Array.from(set);
    newArr.sort((a,b) => b.count - a.count);
    return newArr;
}
console.log(tenMostFrequentWords(paragraph));

Why is this happening?

conradQQ
  • 76
  • 6
  • 2
    Set determines uniqueness by object reference, so adding new objects won't automatically dedupe – pilchard Jun 29 '22 at 21:33
  • 1
    You'll want to use a 'group by' to count frequency. see: [Counting words in javascript and push it into an object](https://stackoverflow.com/questions/40102199/counting-words-in-javascript-and-push-it-into-an-object) – pilchard Jun 29 '22 at 21:36

2 Answers2

2

You're adding new objects to the set in every iteration of your loop. A Set will compare them by reference (object identity), not by structural equality, so you're adding each word multiple times. Instead, use a Map for the counts by word (and don't use filter for counting, that amounts to quadratic complexity):

const tenMostFrequentWords = (str) => {
    const regex = /\b[a-z]+\b/gi;
    const words = str.match(regex);
    const counts = new Map();
    for (word of words) {
        counts.set(word, (counts.get(word) ?? 0) + 1);
    }
    const newArr = Array.from(counts, ([word, count]) => ({word, count}));
    newArr.sort((a,b) => b.count - a.count);
    return newArr.slice(0, 10);
}
const paragraph = `I love teaching. If you do not love teaching what else can you love. I love Python if you do not love something which can give you all the capabilities to develop an application what else can you love.`;
console.log(tenMostFrequentWords(paragraph));
Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • Can you elaborate on this sytnax please "(counts.get(word) ?? 0) + 1)". Not sure what's going on there. Haven't seen the "??" operator before. Thank you! – conradQQ Jun 29 '22 at 22:43
  • oh, and also this: Array.from(counts, ([word, count]) => ({word, count})); – conradQQ Jun 29 '22 at 22:51
  • 1
    @ConRadQ It's the [nullish coalescing operator](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Nullish_coalescing_operator), so when `.get()` returns `undefined` (when the word doesn't exist in the map) we fall back to `0`. – Bergi Jun 29 '22 at 22:51
  • 1
    @ConRadQ What part of the `Array.from` expression is unclear? – Bergi Jun 29 '22 at 22:52
  • Everything after "counts," haha. Not sure what purpose "([word, count]) => ({word, count}); serves – conradQQ Jun 29 '22 at 22:54
  • 1
    @ConRadQ [`Array.from`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/from) takes a callback to map the values from the iterator. It's equivalent to `Array.from(counts).map(([word, count]) => ({word, count});`, but does it in one go without a temporary array. – Bergi Jun 29 '22 at 22:56
  • Thanks, @Bergi you've been a massive help! – conradQQ Jun 29 '22 at 22:57
1

You iterate through arr and word "love" is in arr 6 times, so it will add it to set 6 times. Create another "arr" and every time you iterate check if the word has already been iterated over.

SIAJSAJ IJSAIJSAJA
  • 329
  • 1
  • 3
  • 14