Suppose you have a very big file, and it'd be to expensive to go through all the lines, or to slow.
How would you pick a line at random (preferably from command line, or python)?
Suppose you have a very big file, and it'd be to expensive to go through all the lines, or to slow.
How would you pick a line at random (preferably from command line, or python)?
You can try this from the command line - not sure if totally random, but at least is a beginning.
$ lines=$(wc -l file | awk '{ print $1 }'); sed -n "$((RANDOM%lines+1))p" file
This works like this:
First, it sets a variable containing the number of lines in the file.
lines=$(wc -l file | awk '{ print $1 }')
Later, it prints a random line within that range:
sed -n "$((RANDOM%lines+1))p" file
As Mark Ransom pointed out, the above solution reads the entire file. I have found a way to choose a random line without (necessarily) having to read the entire file, but just part of it. Using (I think) the same algorithm, here are the links to both Perl and Python solutions:
Perl: How do I pick a random line from a file?
perl -e 'srand;' \
-e 'rand($.) < 1 && ($it = $_) while <>;' \
-e 'print $it' FILE
Python: Retrieving a Line at Random from a File of Unknown Size
import random
def randomLine(file_object):
"Retrieve a random line from a file, reading through the file once"
lineNum = 0
selected_line = ''
while 1:
aLine = file_object.readline( )
if not aLine: break
lineNum = lineNum + 1
# How likely is it that this is the last line of the file?
if random.uniform(0,lineNum)<1:
selected_line = aLine
file_object.close( )
return selected_line
If you want to do it in python. Here you are.
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import os
import random
def test():
filename = 'yourfile'
info = os.popen('wc -l filename').readlines()
line_number = info[0].split()[0]
r = random.randrange(line_number)
cmd = 'sed -n "%dp" %s' % (r, filename)
info = os.popen(cmd).readlines()
print info
if __name__ =='__main__':
test()
May be you can use linecache,
import linecache
linecache.getline(file_path, line_no)