Is there a built-in method to do it? If not how can I do this without costing too much overhead?
-
1@Greg That's Perl, not Python – quantumSoup Aug 22 '10 at 05:29
-
4@quantumSoup: The question uses Perl in its examples, but the question is language agnostic. The most useful answers use pseudocode, easily translated to your language of choice. – Greg Hewgill Aug 22 '10 at 05:32
-
1Thanks, I also found this help a lot: http://mail.python.org/pipermail/tutor/2007-July/055635.html You have to read them into memory though. – Shane Aug 22 '10 at 05:35
-
2@Greg That's not really applicable to file I/O, which can be very different from language to language. – quantumSoup Aug 22 '10 at 05:39
-
1@quantumSoup reading files linewise is basically the same in *all* languages. – P Shved Aug 22 '10 at 06:59
-
how does one index a random line without loading the entire file into memory? – Charlie Parker Feb 06 '21 at 14:18
12 Answers
Not built-in, but algorithm R(3.4.2)
(Waterman's "Reservoir Algorithm") from Knuth's "The Art of Computer Programming" is good (in a very simplified version):
import random
def random_line(afile):
line = next(afile)
for num, aline in enumerate(afile, 2):
if random.randrange(num):
continue
line = aline
return line
The num, ... in enumerate(..., 2)
iterator produces the sequence 2, 3, 4... The randrange
will therefore be 0 with a probability of 1.0/num
-- and that's the probability with which we must replace the currently selected line (the special-case of sample size 1 of the referenced algorithm -- see Knuth's book for proof of correctness == and of course we're also in the case of a small-enough "reservoir" to fit in memory ;-))... and exactly the probability with which we do so.

- 142,882
- 41
- 325
- 378

- 854,459
- 170
- 1,222
- 1,395
-
10I've always thought that the `random.choice()` function should work on arbitrary iterators as well as sequences, implementing exactly the above algorithm. – Greg Hewgill Aug 22 '10 at 05:54
-
3@Greg Hewgill, that would be nice but every tenth question would then be "where did my iterator go" – aaronasterling Aug 22 '10 at 06:08
-
2@aaron, right -- same reason, e.g., there is no `len` for iterators... the "algorithm" is not hard to see, but consuming the iterator is considered a too-often-surprising effect. It's a series of hard design decisions, of course (e.g., `sum` _does_ consume the iterator -- the decision there is that the summation may well be all the user requires while the length or one random item is less likely to be so... always iffy decisions either way -- if we had a way to clearly mark a name as "having side effects", like Ruby's trailing bang, the design choices might be different). – Alex Martelli Aug 22 '10 at 14:26
-
1@Henry, right - edited the A to attribute it properly, tx for the reminder. – Alex Martelli Aug 22 '10 at 14:44
-
Gives me `StopIteration` exception on `line = next(afile)` (Python 3.8) (on second call) – Ali Tou Feb 27 '22 at 18:09
-
When you call the function a second time on the same iterator (e.g. the same open file) the iterator will have been entirely consumed by the first call, so it will now be empty and StopIteration is correct, just like for any other empty iterator. If you need to *repeatedly* get a random item from an iterator, you must first copy the whole iterator into a list of items, then random.choice on the list is simplest. – Alex Martelli Feb 28 '22 at 17:33
import random
lines = open('file.txt').read().splitlines()
myline =random.choice(lines)
print(myline)
For very long file: seek to random place in file based on it's length and find two newline characters after position (or newline and end of file). Do again 100 characters before or from beginning of file if original seek position was <100 if we ended up inside the last line.
However this is over complicated, as file is iterator.So make it list and take random.choice (if you need many, use random.sample):
import random
print(random.choice(list(open('file.txt'))))

- 5,447
- 23
- 31
-
22If the task is to read just a line, it doesnt make sense to load the full file into memory. – iankit Feb 23 '16 at 11:08
-
1This solution is simple and trivial to understand. I would recommend this solution as a final answer. – Francisco Maria Calisto May 23 '19 at 14:33
-
2This is a valid solution but it wouldn't strip \r\n or EOL. You need to add .rstrip() to clean it up – Payam May 30 '19 at 22:23
-
are you loading the entire file into memory? It would be nice to comment on this sort of thing as @iankit has commented about it. – Charlie Parker Feb 06 '21 at 14:03
-
I like this one since it doesn't matter that you load the whole file in your memory if your pc is made after 2000 – eeeeeeeeeeeeeeeeeeeeeeeeeeeeee Jan 31 '23 at 09:20
It depends what do you mean by "too much" overhead. If storing whole file in memory is possible, then something like
import random
random_lines = random.choice(open("file").readlines())
would do the trick.

- 6,635
- 2
- 20
- 16
Although I am four years late, I think I have the fastest solution. Recently I wrote a python package called linereader, which allows you to manipulate the pointers of file handles.
Here is the simple solution to getting a random line with this package:
from random import randint
from linereader import dopen
length = #lines in file
filename = #directory of file
file = dopen(filename)
random_line = file.getline(randint(1, length))
The first time this is done is the worst, as linereader has to compile the output file in a special format. After this is done, linereader can then access any line from the file quickly, whatever size the file is.
If your file is very small (small enough to fit into an MB), then you can replace dopen
with copen
, and it makes a cached entry of the file within memory. Not only is this faster, but you get the number of lines within the file as it is loaded into memory; it is done for you. All you need to do is to generate the random line number. Here is some example code for this.
from random import randint
from linereader import copen
file = copen(filename)
lines = file.count('\n')
random_line = file.getline(randint(1, lines))
I just got really happy because I saw someone who could benefit from my package! Sorry for the dead answer, but the package could definitely be applied to many other problems.

- 993
- 1
- 8
- 22
-
1I had ValueError line no. not found, but line no. was less than size of the file. – kakarukeys Jun 30 '17 at 07:46
-
1Cool stuff! Is there a reason you index file lines beginning with 1? (getline(file, 0) returns the last line) – Jura Brazdil May 15 '19 at 12:38
If you don't want to load the whole file into RAM with f.read()
or f.readlines()
, you can get random line this way:
import os
import random
def get_random_line(filepath: str) -> str:
file_size = os.path.getsize(filepath)
with open(filepath, 'rb') as f:
while True:
pos = random.randint(0, file_size)
if not pos: # the first line is chosen
return f.readline().decode() # return str
f.seek(pos) # seek to random position
f.readline() # skip possibly incomplete line
line = f.readline() # read next (full) line
if line:
return line.decode()
# else: line is empty -> EOF -> try another position in next iteration
P.S.: yes, that was proposed by Ignacio Vazquez-Abrams in his answer above, but a) there's no code in his answer and b) I've come up with this implementation myself; it can return first or last line. Hope it may be useful for someone.
However, if you care about distribution, this code is not an option for you.

- 4,269
- 6
- 29
- 39
A slightly improved version of the Alex Martelli's answer, which handles empty files (by returning a default
value):
from random import randrange
def random_line(afile, default=None):
line = default
for i, aline in enumerate(afile, start=1):
if randrange(i) == 0: # random int [0..i)
line = aline
return line
This approach can be used to get a random item from any iterator using O(n)
time and O(1)
space.

- 142,882
- 41
- 325
- 378
If you don't want to read over the entire file, you can seek into the middle of the file, then seek backwards for the newline, and call readline
.
Here is a Python3 script which does just this,
One disadvantage with this method is short lines have lower likelyhood of showing up.
def read_random_line(f, chunk_size=16):
import os
import random
with open(f, 'rb') as f_handle:
f_handle.seek(0, os.SEEK_END)
size = f_handle.tell()
i = random.randint(0, size)
while True:
i -= chunk_size
if i < 0:
chunk_size += i
i = 0
f_handle.seek(i, os.SEEK_SET)
chunk = f_handle.read(chunk_size)
i_newline = chunk.rfind(b'\n')
if i_newline != -1:
i += i_newline + 1
break
if i == 0:
break
f_handle.seek(i, os.SEEK_SET)
return f_handle.readline()

- 42,413
- 44
- 197
- 320
Seek to a random position, read a line and discard it, then read another line. The distribution of lines won't be normal, but that doesn't always matter.

- 776,304
- 153
- 1,341
- 1,358
-
4In particular, this makes it impossible to ever select the first line (as well as picking other lines with a probability proportional to the length of each previous line). My A doesn't produce a normal distribution either (that would be weird -- what mean, what variance?!), but a uniform one, which seems somewhat more likely to meet the OP's meaning for "random". – Alex Martelli Aug 22 '10 at 05:38
-
2To overcome the problem pointed by @AlexMartelli, Choose the first line in case the random seek leads you to the last line. But another issue here is that a line having relatively more words to other lines will have higher probability of getting selected. – Ashwin Surana Jun 11 '16 at 20:36
This may be bulky, but it works I guess? (at least for txt files)
import random
choicefile=open("yourfile.txt","r")
linelist=[]
for line in choicefile:
linelist.append(line)
choice=random.choice(linelist)
print(choice)
It reads each line of a file, and appends it to a list. It then chooses a random line from the list. If you want to remove the line once it's chosen, just do
linelist.remove(choice)
Hope this may help, but at least no extra modules and imports (apart from random) and relatively lightweight.

- 21
- 2
import random
with open("file.txt", "r") as f:
lines = f.readlines()
print (random.choice(lines))

- 1,015
- 1
- 13
- 21
Here is another way, a little like Philip Hughes' explanation, but with an addition of .strip
in case you are collecting a random line that needs extra space cleaning or for jumping rows.
Code:
import random
def random_line():
file = open("file.txt", "r")
text = []
for line in file:
line = line.strip()
text.append(line)
file.close()
x = random.randrange(0,len(text))
return x
It does the following: imports the random module, reads a file, creates a list with the extracted and "cleaned" lines from that file (.txt in this case), closes the file, and selects a random item (that was a line in the .txt) from the created list.

- 847
- 2
- 11
- 31

- 11
- 3
You can add the lines into a set() which will change their order randomly.
filename=open("lines.txt",'r')
f=set(filename.readlines())
filename.close()
To find the 1st line:
print(next(iter(f)))
To find the 3rd line:
print(list(f)[2])
To list all the lines in the set:
for line in f:
print(line)

- 158
- 1
- 7