0

Basically I want to implement a fuzzy search that disregards language!

For example - let's say that there's an entry for "Hello World".
Now, I want this to work with:

  • "hello"
  • "henlp"
  • "руддщ" (these are the Russian characters if you try to type "hello" but forget to switch to English)
  • "рутдз" (same as above but with "henlp" instead of "hello")
  • "יקמךם" (same as above but in Hebrew)

etc.

Now the things that makes most sense to me is to ignore the actual text and regard their relevant keyCodes, which all obviously work universally).

I did thought about for each entry, saving an array which represents all key codes - and then implement fuzziness based on the already given keyCodes instead of chars, but that feels like I'm doing something wrong, or missing something that already exists.

So, from what I've gathered there's no implementation of fuzzy search that regards this.
Is there maybe an alogrithm (other than fuzzy search) that already regards this which I'm missing?

Currently trying to implement in Node.js but open for more languages and frameworks

Wassap124
  • 381
  • 4
  • 14
  • "*"руддщ" (these are the Russian characters if you try to type "hello" but forget to switch to English)*" depends on the keyboard layout set. You seem to be referring to [this one](https://www.pngfind.com/pngs/m/326-3264221_russian-keyboard-layout-norwegian-keyboard-layout-windows-hd.png) but with [this one](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQ2AYMx94KfCzFyyWkj0fDRPO1L7m2EsWdKT3yRHCArdRUjLS_JPg&s) you'd get "челло". The keyboards can also have different physical layout which can affect things. – VLAZ Apr 16 '20 at 14:09
  • Seems that [azerty keycodes](https://stackoverflow.com/questions/45624118/keycode-detection-on-azerty-vs-qwerty) are different to the qwerty ones, so you should be careful about that, as well. – VLAZ Apr 16 '20 at 14:11
  • Also, for more fun, a German keyboard has the `z` and `y` keys swapped. So, somebody using a German keyboard with the English layout may type mistakenly `hez` instead of `hey`, which keycode detection will mistakenly confirm is `hez` (the keycodes are also swapped with different chosen layout in the OS). – VLAZ Apr 16 '20 at 14:17
  • Well then, how do you explain Google go about their autocompletion? since it can understand "mistaken" language pretty precisely – Wassap124 Apr 16 '20 at 14:17
  • 1
    I'm not saying it *can't be done*, I'm saying it's *hard*. If anything, Google doing something is an indication that *the task is very, very hard*. Google has terabytes of real usage data they can verify their implementation against and also use A LOT of machine learning and data mining/analysis to derive their algorithms and make predictions. So, what Google does is most likely an intelligent guess driven by, and backed up by, an enormous amount of user data to ensure the accuracy. – VLAZ Apr 16 '20 at 14:20

0 Answers0