1

I have several arrays of 100 strings which contain comments that I have retrieved using fetch. (Called comments) and another array containing 10,000 keywords. (Called keywords)

For each comment in the array I want to check which of the keywords it contains. (I need to know all of the keywords it contains but do not need to know how many times it occurs)

What is the fastest way to do this?

I have tried forEach loops within each other:

keywords.forEach(word => {
    comments.forEach(comment => {
       if(comment.includes(word)) //call a function
    }
})

as well as for loops within each other:

for(i = 0; i < keywords.length; i++) {
    for(j = 0; j < comments.length; j++){
         if(comments[j].includes(keywords[i])) //call a function
    }
}

For both of these I have tried switching the inner and outer loops.

I have also tried building a regular expression with my keywords and matching against that using matchAll and a for..of loop.

keywords.forEach(word => regex = regex + `(^|\\b)${word}(\\b|$)|`)
comments.forEach(comment => {
    const matches = comment.matchAll(regex)
    for (const match of matches){
       //call a function
    }
})

These all worked on my initial test of ~10 keywords but obviously it is going to take a lot longer with 10,000 keywords. What is the most efficient way of doing this?

This is all quite new to me so there is a chance I am missing something obvious!

Thank you

nathanael
  • 21
  • 2
  • 1
    Hi, welcome to SO, you're right, basically nested loop are bad for time performance if you have large inputs (i.e O(N2) see BigO notation). You can avoid this nested loop and cut the time by converting the keywords into an object. See this answer, I think it will help you - https://stackoverflow.com/a/48411784/7015414 – Fatah Apr 12 '21 at 16:35
  • Thank you. Converting to object and using an index is a good idea. I will try this one too and see how it compares. – nathanael Apr 13 '21 at 09:34

1 Answers1

1

Try using a Set. I've made a Set out of the keywords array, now for every word in the comments array, I can lookup in the Set if it contains that word or not.

const keywords = ["java", "golang", "python", "ruby"];

const comments = ["I love java", "Golang is by google", "Python bit me hard"];

const hash = new Set(keywords.map((k) => k.toLowerCase()));

const test = (w) => console.log(w);

comments.forEach((c) =>
  c.split(" ").forEach((w) => hash.has(w.toLowerCase()) && test(w))
);
Som Shekhar Mukherjee
  • 4,701
  • 1
  • 12
  • 28