4

I am making a Dictionary Application. I am using Pearson Dictionary API for the same. I need to generate a word so that I could query that word for its definition.

PROBLEM

I know how to generate a random word but I don't know how to generate a meaningful English word.

I tried to solve this problem by requesting a JSON response and checking the results[](results[ ] hold definitions for the word) in the response. So, if results[].lenght > 0 then the word is a valid English word.

But the solution above has its own serious problem: Suppose I want to generate a 5 letter word, there are as many as 26^5 = 11881376different combinations whereas there aren't as many 5 letter meaningful English words. As the letters in the word increases, the number of combinations increases too. Thus, generating a meaningful word can take a very long time.

How can I check if the generated word is a meaningful English word or not? Isn't there any feasible programmatic way of doing this?

OR Is there any other way I could solve this Problem?

ray an
  • 1,132
  • 3
  • 17
  • 42
  • You either generate random strings of letters and see if they're words (which, as you realise, is very slow) or you store a list of "known good" words and select randomly from that list. How big that list needs to be depends on what you're trying to achieve. [According to this page](https://en.oxforddictionaries.com/explore/how-many-words-are-there-in-the-english-language) the OED has 171k main entries, but [according to this page](http://www.economist.com/blogs/johnson/2013/05/vocabulary-size) an average adult knows about 30,000 so a prudent selection of 50,000 should cover most things. – TripeHound Apr 28 '17 at 13:04
  • Wouldn't that increase the size of my app? – ray an Apr 28 '17 at 13:07
  • Yes, obviously. But 50,000 x 10 letters (which is almost certainly more than the average) is only about 1/2MB, and there would be ways of compressing that. – TripeHound Apr 28 '17 at 13:09
  • Where can I find such lists??? – ray an Apr 28 '17 at 13:18
  • For curiosity, I'd just grabbed [one from here](https://github.com/dwyl/english-words) that has 350,000+ words (including variants). This just happened to be the top result from Googling "english word list". – TripeHound Apr 28 '17 at 13:23
  • I will accept your answer(if you write one) if I don't get any better solution. Thank you for your help. :) – ray an Apr 28 '17 at 13:45

3 Answers3

4

As far as I can see, you either generate random strings of letters and check to see if they're words (which, as you realise, is very slow, hit-or-miss approach) or you store a list of "known good" words and select randomly from that list.

How big that list needs to be depends on what you're trying to achieve.

According to this page the OED has around 171,476 main entries, not including variants like plurals (cat, cats), standard variants (sit, sitting), nor words that have multiple classes (e.g. dog can be a noun [the animal] or a verb [to follow persistently] etc.). According to this page an average adult knows between 20,000 and 35,000 words, so a prudent selection of 50,000 should cover most general purpose uses.

The answers to this question (now closed) provide a number of sources for word-lists. Examining one of them (originally provided by infochimps.org but available as a simple text-list on github) shows that the average length of 350,000+ words is just under 10 characters. For Linux (and possibly other flavours) /usr/share/dict/words may be a useful place to start.

Community
  • 1
  • 1
TripeHound
  • 2,721
  • 23
  • 37
1

There is this beautifull text file containing all english wordS:

https://github.com/AlexHakman/Java-challenge/blob/master/words.txt

You can then generate 5 letter words based on whats inside this text document :)

Get per line the length of the line, or just generate and compare it with the text file :)

Alex
  • 789
  • 1
  • 6
  • 13
1

Instead of doing it random because you need to spend time verifying just store a dictionary of the words that you would require and have a lookup table for it.

A relatively complete dictionary for English is about 2MBs compressed like the one here http://wordlist.aspell.net/12dicts/

Even for an Android app unless you're targeting really under powered devices it shouldn't be that big.

You can use SQLite to store the data so it may take up a bit more storage but you get SQL as your query language rather than making up your own.

Since you would also need a bit of randomness, each row can add some sort of randomized key that you can further query.

If you really wanted to limit it to 5 characters then just use a subset of the dictionary. But this will allow you to have an arbitrary length even length ranges (e.g. 2 to 10 characters)

Archimedes Trajano
  • 35,625
  • 19
  • 175
  • 265