How do I optimize this synchronous "fuzzy search" function and turn it into an async function?

Question

Problem

I'm trying to implement some sort of "fuzzy search" in my Node.js based project.

Fuzzy search is a search that returns results even if the string didn't match exactly.

I found this code in another stackoverflow thread. Code is below.

It's quite good, but the problem is - it's synchronous It slows down the whole program when it searches through a large array.

Question

ES6 methods are welcomed. Needs to work only in the latest Chrome, so any JS methods will work.

Are there any new JS methods that would optimize this function?
Am I doing something wrong there that makes it even slower?
Can I turn this function into an async function that returns a promise? Will it stop freezing the app during the search then?
Are there any better "fuzzy search" implementations you know of? (I found a module called fuzzysort, can't say if it's that much better though, it won't return "folder test" if you type "test folder" (wrong order) so it's not that good)

Code

Calling search function

searchArray is an array of paths it searches through, e.g.: ["C:\\test", "C:\\file.txt"...] (0.5 - 5 million paths)

searchQuery is a string without spaces, e.g.: filetxt

search () {
  const fuzzySearch = this.fuzzySearch(this.searchQuery.toLowerCase(), this.searchArray) 
  let result = fuzzySearch.filter(element => element.relevance >= 0.3)
  // sort by relevance
  var sortedResults = result.sort((a, b) => parseFloat(b.relevance) - parseFloat(a.relevance)).map(item => item.name);
      
  this.searchResults = sortedResults
},

The fuzzy search function

fuzzySearch (searchQuery, searchArray) {
  const get_bigrams = function(string) {
  const s = string.toLowerCase();
  const v = new Array(s.length - 1);
  for (let i = 0, end = v.length; i <= end; i++) {
      v[i] = s.slice(i, i + 2);
  }
  return v;
  };

  const string_similarity = function(str1, str2) {
      if ((str1.length > 0) && (str2.length > 0)) {
          const pairs1 = get_bigrams(str1);
          const pairs2 = get_bigrams(str2);
          const union = pairs1.length + pairs2.length;
          let hit_count = 0;
          for (let x of Array.from(pairs1)) {
              for (let y of Array.from(pairs2)) {
                  if (x === y) {
                      hit_count++;
                  }
              }
          }
          if (hit_count > 0) {
              return ((2.0 * hit_count) / union);
          }
      }
      return 0.0;
  };

  let results = [];
  for (let name of searchArray) {
    // I added .match to use only the base filename (name+ext) not the whole path, and removed all characters
    let filteredPath = name.match(/[^\\\/]+$/)[0].replace(/[^A-Za-z0-9.]+/g, '')
              
    const relevance = string_similarity(searchQuery, filteredPath);
    const obj = {name, relevance};
    results.push(obj);
  }
  return results
},

@karthick sorry for the confusion, it's just an array of paths that it searches through (it's very big): `["C:\\test", "C:\\file.txt"...]` — Un1, Jun 01 '18 at 22:18
_"Node.js based project" "Needs to work only in the latest Chrome"_ Is `fuzzySearch` running in Node or Chrome? — AuxTaco, Jun 01 '18 at 23:27
@AuxTaco well, both, it's an electron project. I just mentioned Node js because it also provides lots of useful modules that can be used for implementing a search. — Un1, Jun 02 '18 at 09:43
Are those paths the actual strings you’re filtering or are you reading and filtering their content ? Also how often the array of paths update ? — user3210641, Jun 02 '18 at 15:53
@user3210641 It's reading just the paths, its only task is to return the paths that "kind of" or "exactly" match the query (hence "fuzzy search"), so if you give it query `"test"` it should only return `"C:\test"` from this array `["C:\test", C:\bob]`. It always gets a ready to use array of paths (0.5 - 5 million), so it only needs to find the ones that match — Un1, Jun 02 '18 at 16:26
A rather obvious optimisation is not to call `get_bigrams` on the `searchQuery` for every comparison again. A not so obvious but very important one is not to use nested loops to count the similiarities. Use an efficient lookup structure, e.g. a `Set`, instead. Further optimisations would depend on the `searchArray` - is it the same array used in multiple searches, are there many items that share large parts? — Bergi, Jun 02 '18 at 21:43
"*Will turning this function into an async function stop freezing the app during the search?*" - no. For that, you'd need to actually defer work, e.g. make pauses after every 1000th iteration. (`async`/`await` can simplify implementing this, though). — Bergi, Jun 02 '18 at 21:48
@Bergi thank you for the tips, mate. Yes it's the same array, though it slices off everything but the base filename before trying to match anything, and if the base filename matches the query it returns its full path. — Un1, Jun 02 '18 at 21:56

How do I optimize this synchronous "fuzzy search" function and turn it into an async function?

Problem

Question

Code

0 Answers0