15

What would be the best way to go about getting a function that returns a random English word (preferably a noun), without keeping a list of all possible words in a file before hand?

dreftymac
  • 31,404
  • 26
  • 119
  • 182
Josh Hunt
  • 14,225
  • 26
  • 79
  • 98
  • This isn't a sensible question. Could you provide some additional context or clue as to what you're trying to do. Generating English words without an English dictionary is a logical contradiction. Please clarify this. – S.Lott Feb 27 '09 at 11:14
  • fetching a word from any online resource designed to provide random words looks like a good idea. :-) – Paulo Guedes Feb 27 '09 at 11:50
  • @joshhunt: What constitutes "massive"? Spellcheck dictionaries for English are about 400K. See http://aspell.net/ for a good one. – S.Lott Feb 27 '09 at 15:56

8 Answers8

34

Word lists need not take up all that much space.

Here's a JSON wordlist with 2,465 words, all nouns. It clocks in at under 50K, the size of a medium-sized jpeg image.

I'll leave choosing a random one as an exercise for the reader.

Community
  • 1
  • 1
Kenan Banks
  • 207,056
  • 34
  • 155
  • 173
  • 3
    This really is the best option. You could easily keep the entire list in memory and you'll have complete control over the source -- no unexpected changes, no connection issues, no security concerns, and overall should be much easier to implement. – Whatsit Feb 27 '09 at 14:44
  • And you don't even need to keep it all in memory. – Kenan Banks Feb 27 '09 at 16:35
10

You can't. There is no algorithm to generate meaningful words. You can only generate words that sound like English, but they won't have any meaning.

Alex Reitbort
  • 13,504
  • 1
  • 40
  • 61
4

You could have the function try and parse an online resource such as:

http://www.zokutou.co.uk/randomword/

Gary Willoughby
  • 50,926
  • 41
  • 133
  • 199
3

Another theoretical approach: you could scrape the random wikipedia article page and return the N-th word of the article.

splattne
  • 102,760
  • 52
  • 202
  • 249
  • It's a nice idea, but you might need to filter out dates and numbers and non-Engilsh words. – Ben Feb 27 '09 at 12:43
  • 1
    The results wouldn't be very random -- you'd tend to get the same few words a lot, and all sorts of other problems. – Whatsit Feb 27 '09 at 14:36
  • 1
    @Whatsit I guess you're right. On the other hand: what des random english word really mean? If you ask somebody for a random word, it will be a similar statistical distribution – splattne Feb 27 '09 at 14:41
2

Just use setgetgo's random word api. It's free, it's easy, and it rocks.

http://randomword.setgetgo.com/

jujibeans
  • 21
  • 1
1

There's a random word generator here - it's not English but it's English-ish, i.e. the words are similar enough to language that a user can read the words and store them in short-term memory.

Source code is in C# and a bit kludged, but you could use a similar approach in Python to generate lots of words without having to store a massive list.

Alternatively, you could call the web service on the demo page directly - it's hosted on GoDaddy though, so no guarantees it will work in production!

1

You can download the "words common to SOWPODS and TWL" lists from http://www.math.toronto.edu/jjchew/scrabble/lists/ . I put all the words in those files together and the list weighed in at about 642k. Not huge by any standards. The lists do contain a whole lot of obscure words though, since they are meant for tournament Scrabble use. The good thing is that the lists form a substantial subset of the English language.

Chinmay Kanchi
  • 62,729
  • 22
  • 87
  • 114
0

Well, you have three options:

  • Hard-code the list of words and initialize an array with it.
  • Fetch the list from an internet location instead of a file.
  • Keep a list of possible words in a file.

The only way to avoid the above is if you're not concerned whether the word is real: you can just generate random-length strings of characters. (There's no way to programmatically generate words without a dictionary list to go from.)

lc.
  • 113,939
  • 20
  • 158
  • 187