1

I was wondering if there was some function where I could put in a string and if this string had words that it could auto-complete to (in the english language) it would return true (for example if the string is "bl" then this function would return true since "blue" is a word) and if not (say the string is "blszc") then it would return false.

  • 1
    There's certainly not a built in function for that. But you can write your own, use a webservice, maybe there's even a library? – moritzg May 31 '17 at 05:53

1 Answers1

3
  1. You will need an inbuild dictionary/corpus of words for autocompleting the words. You can use NLTK with Wordnet : http://www.nltk.org/howto/wordnet.html

  2. You would need a scoring to score the suggestions , it could be lexicographical i.e. for "bl" suggestions like "blaze" , "blah" would appear before "blue" . Or it could be based on importance of the word in common language. To roughly do that you can use frequency distribution of the words in Brown Corpus in NLTK as starting point . ( example here : https://stackoverflow.com/a/38234480/533399 )

  3. You would need a fast rest service as autocomplete happens on every keystroke ( or every few keystrokes if throttled, and the user is typing very fast ). For this you would need to store your data in a data store which has retreival's optimized for prefixes 'eg' blu shall return 'blue','blunt' etc . My suggestion would be Solr/ElasticSearch ( this would in future also allow you to have features of spelling correction or 'did you mean' )

  4. As a simple optimization, you shall limit your autocomplete suggestions to first 5-10 results found for any prefix, it shall be your ranking that should be so good that the best match appears at higher positions in the suggestion list.

DhruvPathak
  • 42,059
  • 16
  • 116
  • 175
  • Thank you, that is actually what I started attempting. I'm not a programmer, but would the fastest way to see if there is at least one word that it could auto-complete to be: autocomp_test=any(string in s for s in words.words()) where it returns true if a word exists that it could auto-complete to. (where words.words is the nltk english dictionary) –  May 31 '17 at 06:21
  • The fastest way is Solr or ElasticSearch, or any custom prefix tree based in-memory structure you can create. – DhruvPathak May 31 '17 at 06:28