2

I have an object with a key as some name to be searched and value its score, which looks like:

{

   'a': 344,
   'apple': 345354,
   'orange': 320,
   'mango': 39999,
   .
   .
   .
}

The map has around half a million keys. I need to do a fuzzy search to create a test auto suggest like functionality (which supports typos) for some text like ornge and it should match orange and also return all words where matches occur. Also, matches could occur anywhere in the string, not just at the beginning. How could I do this using regex? I was trying the following:

(?=.*\ornge\b).*

but it does not work. How could I do this?

Using regex might not be the best solution. Could you suggest an alternative method?

Dan Hick
  • 29
  • 3
  • Maybe `o.*r.*n.*g.*e`? – revo Sep 13 '18 at 14:23
  • 3
    IMHO you should not try to do this in regex. – apple apple Sep 13 '18 at 14:24
  • @appleapple Okay. Could you suggest a method? – Dan Hick Sep 13 '18 at 14:24
  • you should find a library to do this, or at least you need to define what you want to match more clearly (so one can actually implement it). I'm not pretty familiar with this, so this is all I can suggest. – apple apple Sep 13 '18 at 14:27
  • There are a variety of fuzzy search algorithms. You could introduce some sort of wildcard in your search (e.g. `*`) or you could go for valid permutations of the input. In the latter case, you would need to implement an approximate string matching algorithm. There are some suggestions [here](https://stackoverflow.com/questions/32337135/fuzzy-search-algorithm-approximate-string-matching-algorithm). – VLAZ Sep 13 '18 at 14:40

2 Answers2

1

Using regex isn't going to be all that simple. I agree with @apple apple that it would make sense to look for a library. I was intrigued about how long it would take to stub out a basic one by hand so I whipped this up, feel free to use/improve on it to optimize for working with that larger list that you have.

The basic gist is that you set a threshold for similarity in length between the input and expected output, and a minimum letter match score that you want to aim for.

For each word in your set, you run a comparison function between that word and the input. In the comparison function, you test each letter of the input against the word to see if it exists - if so, remove that letter from the word and move to the next one (to avoid inflated scores from multiple matches to the same letter: i.e. the string aaaaaa would score a 6 against apple if the a isn't removed after the first match)

If you wanted to enable multiple suggestions, you could replace the while(++i < listlen) loop with wordlist.filter, and return all words that fall above a certain score threshold.

const wordlist = ["apple", "orange", "mango", "fruit", "banana", "kiwi", "grapefruit"],
  listlen = wordlist.length,
  wordLengthDifferential = 1,
  scoreDifferential = 3

function compare(str, word){
  let score = 0
  str = str.split('')
  word = word.split('')
  while(str.length){
    let idx = word.indexOf(str.pop())
    if(idx > -1){
      ++score
      word.splice(idx, 1)
    }
  }
  return score
}

function getSuggestion(str){
  let highScore = 0, suggestion = null, i = -1
  
  while(++i < listlen){
    let word = wordlist[i]
    if(Math.abs(word.length - str.length) <= wordLengthDifferential) {
      let score = compare(str, word)
      console.log(str, word, score)
      if(score > highScore && score >= scoreDifferential){
       suggestion = word
        highScore = score
      } 
    }
  }  
  
  return suggestion || "no relevant matches"
}

document.querySelector('button').onclick = e => console.log(getSuggestion(document.querySelector('input').value))
document.addEventListener('keydown', e => {
 if(e.keyCode == 13) console.log(getSuggestion(document.querySelector('input').value))
})
<input type="text" /><button>Go</button><em>(or press enter)</em>
jmcgriz
  • 2,819
  • 1
  • 9
  • 11
0

For text input inputVal and your object obj.:

let regexString = '';
inputVal.split('').forEach(char => regexString += char + '.*');
const rgx = new RegExp(regexString, 'i');
const result = Object.keys(obj).filter(key => key.match(rgx));
inorganik
  • 24,255
  • 17
  • 90
  • 114