2

Suppose you have a very big file, and it'd be to expensive to go through all the lines, or to slow.

How would you pick a line at random (preferably from command line, or python)?

Bob
  • 10,741
  • 27
  • 89
  • 143
  • 3
    It's impossible to pick a random line from a file without knowing ahead of time how many lines are in the file and where each line starts. You must read the entire file otherwise. See http://stackoverflow.com/questions/232237/whats-the-best-way-to-return-a-random-line-in-a-text-file-using-c for inspiration. – Mark Ransom Jul 23 '14 at 02:55
  • 1
    You can use _wc_ with _sed_ .. have a look t jim's answer – Amr Ayman Jul 23 '14 at 03:11
  • Is the line length constrained to be constant throughout the file? – moooeeeep Jul 23 '14 at 08:47
  • @moooeeeep, if so, the idea would be to divide the filesize against the average record length in order to estimate the number of lines within the file? – jimm-cl Jul 24 '14 at 03:37
  • also have a look at this question for some suggestions that don't resort to process the entire file at least once: http://stackoverflow.com/q/13478232/1025391 – moooeeeep Jul 24 '14 at 07:20

3 Answers3

1

You can try this from the command line - not sure if totally random, but at least is a beginning.

$ lines=$(wc -l file | awk '{ print $1 }'); sed -n "$((RANDOM%lines+1))p" file  

This works like this:

  • First, it sets a variable containing the number of lines in the file.

    lines=$(wc -l file | awk '{ print $1 }')
    
  • Later, it prints a random line within that range:

    sed -n "$((RANDOM%lines+1))p" file
    

As Mark Ransom pointed out, the above solution reads the entire file. I have found a way to choose a random line without (necessarily) having to read the entire file, but just part of it. Using (I think) the same algorithm, here are the links to both Perl and Python solutions:

  • Perl: How do I pick a random line from a file?

    perl -e 'srand;' \
         -e 'rand($.) < 1 && ($it = $_) while <>;' \
         -e 'print $it' FILE
    
  • Python: Retrieving a Line at Random from a File of Unknown Size

    import random
    
    def randomLine(file_object):
        "Retrieve a random line from a file, reading through the file once"
        lineNum = 0
        selected_line = ''
    
        while 1:
            aLine = file_object.readline(  )
            if not aLine: break
            lineNum = lineNum + 1
            # How likely is it that this is the last line of the file?
            if random.uniform(0,lineNum)<1:
                selected_line = aLine
        file_object.close(  )
        return selected_line
    
jimm-cl
  • 5,124
  • 2
  • 25
  • 27
  • 1
    `wc` will read through the entire file, and `sed` will read up to the selected line. This technically answers the question but violates the stated constraints. – Mark Ransom Jul 23 '14 at 03:22
0

If you want to do it in python. Here you are.

#!/usr/bin/env python
#-*- coding:utf-8 -*-

import os
import random

def test():
    filename = 'yourfile'
    info = os.popen('wc -l filename').readlines()
    line_number = info[0].split()[0]

    r = random.randrange(line_number)
    cmd = 'sed -n "%dp" %s' % (r, filename)
    info = os.popen(cmd).readlines()

    print info



if __name__ =='__main__':

    test()
Stephen Lin
  • 4,852
  • 1
  • 13
  • 26
0

May be you can use linecache,

import linecache
linecache.getline(file_path, line_no)
Qiang Jin
  • 4,427
  • 19
  • 16