6

I want to understand how google handles no space between 2 words. For example there are 2 words - word1 and word2. I write in search box 'word1word2', it says do you mean 'word1 word2' or just understands to look for 'word1 word2'. Any information what data structure and algorithm they use? I see in this answer How to split text without spaces into list of words?, it is suggested to use trie data structure.

Community
  • 1
  • 1
John
  • 3,821
  • 2
  • 18
  • 25
  • It would be best to ask a google developer. –  Jul 13 '12 at 13:03
  • 1
    this is not about data structures, but mainly about statistics and probability estimates – usamec Jul 13 '12 at 13:07
  • 1
    Possible duplicate of [How google split words bunched together (without spaces)?](https://stackoverflow.com/questions/53720647/how-google-split-words-bunched-together-without-spaces) – Fifi Dec 11 '18 at 21:32

2 Answers2

5

In the candidate generation of the spell corrector, you allow as a possibility omission of a space, just as you allow omission of other letters.... Perhaps look at the spelling correction lecture here: http://nlp-class.org/ [sorry, self-promotion] or Peter Norvig's intro: http://norvig.com/spell-correct.html

Christopher Manning
  • 9,360
  • 34
  • 46
1

I assume you must have a script (using ajax for exemple http://net.tutsplus.com/tutorials/javascript-ajax/adding-a-jquery-auto-complete-to-your-google-custom-search-engine/)

Basically you check the words in a dictionary. The space must not be a condition to check the word but just a possibility. For exemple a simple algo(really simple) would be : "severalwords" you check the 3 firsts letter, nothing ? Then you check the 4 firsts...

Here is some explanations about google search engine : https://developers.google.com/search-appliance/documentation/60/admin_searchexp/ce_improving_search

Maybe here can help too : http://tm.durusau.net/?cat=1106

An-droid
  • 6,433
  • 9
  • 48
  • 93