0

I have a text file say, testFile.txt and an array of strings to be searched in the file as say, ['year', 'weather', 'USD 34235.00', 'sportsman', 'ಕನ್ನಡ']. I can break the file into tokens with NodeJS natural and maybe, create a large array (~100-200x the number of entries in the string array) out of it. Then, sort both the arrays and start the search. Or, use lodash directly?

A Found result is when at least one string from the search string array is found in the text file; else, it should be considered as NotFound.

What are some of the options to implement such a search?

cogitoergosum
  • 2,309
  • 4
  • 38
  • 62

1 Answers1

0

I could suggest using Set for large array of tokens, then iterate through the search terms array, check if the tokens set has one of those terms. If the terms array is also large, you could considers using Set for that (MDN docs for Set)

You could see the performance comparision between array and set in context of large number of elements, from this comment

Below is the demo snippet

const tokens1 = ['ಕನ್ನಡ', 'asdasd', 'zxczxc', 'sadasd', 'wqeqweqwe', 'xzczxc']
const tokens2 = ['xzczcxz', 'asdqwdaxcxzc', 'asdxzcxzc', 'wqeqwe', 'zxczcxzxcasd']
const terms = ['year', 'weather', 'USD 34235.00', 'sportsman', 'ಕನ್ನಡ']

const set1 = new Set(tokens1)
const set2 = new Set(tokens2)

const find = (tokensSet, termsArray) => {
  for (const term of termsArray) {
    if (tokensSet.has(term)) {
      return 'Found'
    }
  }
  return 'Not Found'
}

console.log(find(set1, terms))
console.log(find(set2, terms))
hgb123
  • 13,869
  • 3
  • 20
  • 38