Selecting A Random Word Out Of Many

Question

How would someone take a random word out of a list of many? Words.txt is a word file with every word in the English dictionary, separated by a new line.

possible duplicate of [Picking a Random Word In Python?](http://stackoverflow.com/questions/4394145/picking-a-random-word-in-python) — tc., Mar 07 '15 at 23:01
Not sure that's a suitable dupe target - it shows how to use `random.choice` on a sequence, not a file or iterable. — Jon Clements, Mar 07 '15 at 23:14
"How would someone take a random word out of a list of many", as good as any :D — Antti Haapala -- Слава Україні, Mar 07 '15 at 23:15

Jon Clements · Answer 1 · 2015-03-08T16:10:52.140

You can efficiently take a random line from a file, using heapq by giving it a random key, eg:

import random, heapq

with open('Words.txt') as fin:
    word, = heapq.nlargest(1, fin, key=lambda L: random.random())

The reason we use heapq.nlargest (we could use heapq.nsmallest - it's pretty arbitrary) here is that it's more memory efficient - we only have to keep a single line in memory at once. It either stays the same line or gets replaced by a line with a higher random value on each iteration of the input. This is the opposite of:

from random import choice

with open('Words.txt') as fin:
    words = list.readlines()
word = choice(lines)

So, in this case, we're loading all words into memory. We then pick a random word from the list. If you're going to keep having to pick words and are fine with having all words in memory, then this is a better approach, as picking something randomly that's in memory is going to be much more efficient that linearly scanning a file each time.

In short, if you know you only ever want one random word (say your program just wants it at startup), then use the first approach and avoid the memory overhead, if you want to repeatedly get more words, take the memory hit and use the second approach.

Of course, if you know you only ever are going to need 100 (pick a numbe here) random words, then adjust the parameters to heapq.nlargest and consume from an iterable, then if you run out, decide what to do next.

import random, heapq

with open('Words.txt') as fin:
    words = heapq.nlargest(100, fin, key=lambda L: random.random())
    word_iter = iter(words)

Then, later on in your script, use something like:

try:
    word = next(word_iter)
except StopIteration:
    # we've exhausted all our pre-loaded random words... 
    # either get more, fail, whatever...

If the downvoter wouldn't mind commenting as to how this answer could be improved/if they find anything wrong with it - it'd be appreciated. — Jon Clements, Mar 07 '15 at 23:04

Selecting A Random Word Out Of Many

1 Answers1