Could someone show me how I could read a random number of lines from a file in Python?
-
1What is the range "a random number of lines"? Is the offset also random? – Gintautas Miliauskas Nov 05 '10 at 12:27
-
5"A number of random lines" and "a random number of lines" are very different things. – Glenn Maynard Nov 05 '10 at 13:08
5 Answers
Your requirement is a bit vague, so here's another slightly different method (for inspiration if nothing else):
from random import random
lines = [line for line in open("/some/file") if random() >= .5]
Compared with the other solutions, the number of lines varies less (distribution around half the total number of lines) but each line is chosen with 50% probability, and only one pass through the file is required.

- 21,076
- 1
- 35
- 50
-
2
-
`random()` returns a random number between 0 and 1 with a uniform distribution. `random() > .5` will be true half of the time ± a normal distribution, i.e. each line is selected with 50% probability. – SimonJ Nov 05 '10 at 12:44
-
@PulpFiction it makes sure that each line is chosen with probability one half. The net effect is to choose a uniform element of the powerset of the set of lines in the file. This will have a random amount of randomly selected lines. – aaronasterling Nov 05 '10 at 12:44
-
@SimonJ, unlike the other solutions, the number of lines chosen varies appropriately. There's only one subset with no lines in it and only one subset with all of the lines in it. There's no reason that they should be chosen as frequently as one of the `N!/(N-N/2)!(N/2)!` sets with `N/2` lines. – aaronasterling Nov 05 '10 at 12:49
-
Don't know if this is the right answer for the question, but if not, it deserves it's own question! Awesome answer! :) – Powertieke Nov 05 '10 at 12:51
-
@aaronsterling: True, good point - this might be a more useful solution than the uniformly random number of lines. – SimonJ Nov 05 '10 at 12:57
-
-
I recommend changing the number different than .5 because this way changing the number in the direction you need is more intuitive. For instance, first, I changed it to 0.1 to get 10 percent, but you actually have to change it to 0.9 to get 10 percent. – So S Aug 06 '20 at 12:09
To get a number of lines at random from your file you could do something like the following:
import random
with open('file.txt') as f:
lines = random.sample(f.readlines(),5)
The above example returns 5 lines but you can easily change that to the number you require. You could also change it to randint()
to get a random number of lines in addition to a number of random lines, but you'd have to make sure the sample size isn't bigger than the number of lines in the file. Depending on your input this might be trivial or a little more complex.
Note that the lines could appear in lines
in a different order to which they appear in the file.

- 190,537
- 57
- 313
- 299
import linecache
import random
import sys
# number of line to get.
NUM_LINES_GET = 5
# Get number of line in the file.
with open('file_name') as f:
number_of_lines = len(f.readlines())
if NUM_LINES_GET > number_of_lines:
print "are you crazy !!!!"
sys.exit(1)
# Choose a random number of a line from the file.
for i in random.sample(range(1, number_of_lines+1), NUM_LINES_GET)
print linecache.getline('file_name', i)
linecache.clearcache()

- 67,571
- 18
- 114
- 106
-
-
@aaronasterling: ehh ? maybe i didn't understand the question well , but he asked for a random number of lines not a random line's numbers right ??? – mouad Nov 05 '10 at 12:39
-
you always return 5 lines, and 5 isn't very random :) But I agree, the question was vague. – SimonJ Nov 05 '10 at 12:41
-
@SimonJ: what i understand from the question is that the OP wanted __a number of random lines__ so i put number of lines to get as a var __NUM_LINES_GET__ so he can choose 3 random line of 4 or ... – mouad Nov 05 '10 at 12:47
-
true, yes - he could just change it to `NUM_LINES_GET = random.randint(1, number_of_lines)` for a truly uniformly *random number of lines*. – SimonJ Nov 05 '10 at 12:53
import os,random
def getrandfromMem(filename) :
fd = file(filename,'rb')
l = fd.readlines()
pos = random.randint(0,len(l))
fd.close()
return (pos,l[pos])
def getrandomline2(filename) :
filesize = os.stat(filename)[6]
if filesize < 4096 : # Seek may not be very useful
return getrandfromMem(filename)
fd = file(filename,'rb')
for _ in range(10) : # Try 10 times
pos = random.randint(0,filesize)
fd.seek(pos)
fd.readline() # Read and ignore
line = fd.readline()
if line != '' :
break
if line != '' :
return (pos,line)
else :
getrandfromMem(filename)
getrandomline2("shaks12.txt")
Assuming the offset is always at the beginning of the file:
import random
lines = file('/your/file').read().splitlines()
n_lines = random.randrange(len(lines))
random_lines = lines[:n_lines]
Note that this will read the entire file into memory.

- 7,744
- 4
- 32
- 34