What's an easy way to read random line from a file in a shell script?
-
Is each line padded to a fixed length? – Tracker1 Jan 15 '09 at 19:03
-
no, each line has variable number of characters – Jan 15 '09 at 19:04
-
large file: http://stackoverflow.com/questions/29102589/get-random-lines-from-large-files-in-bash – Ciro Santilli OurBigBook.com Nov 20 '15 at 09:58
13 Answers
You can use shuf
:
shuf -n 1 $FILE
There is also a utility called rl
. In Debian it's in the randomize-lines
package that does exactly what you want, though not available in all distros. On its home page it actually recommends the use of shuf
instead (which didn't exist when it was created, I believe). shuf
is part of the GNU coreutils, rl
is not.
rl -c 1 $FILE

- 62,887
- 36
- 269
- 388
-
-
2
-
Does this `r1` have any advantages? `shuf` seams to work perfectly! – Thomas Ahle Jun 10 '11 at 15:46
-
shuf is great as a drop-in replacement for head command, good to know – Tomasz Tybulewicz Jun 10 '13 at 07:45
-
5Andalso, `sort -R` is definitely going to make one wait **a lot** if dealing with considerably huge files -- 80kk lines --, whereas, `shuf -n` acts quite instantaneously. – Rubens Jun 18 '13 at 06:56
-
23You can get shuf on OS X by installing `coreutils` from Homebrew. Might be called `gshuf` instead of `shuf`. – Alyssa Ross Dec 27 '13 at 22:27
-
2Similarly, you can use `randomize-lines` on OS X by `brew install randomize-lines; rl -c 1 $FILE` – Jamie Apr 09 '14 at 18:03
-
@Rubens: [the same question](http://stackoverflow.com/questions/9245638/select-random-lines-from-a-file-in-bash/9245733#comment40761869_9245733) – jfs Sep 24 '14 at 18:50
-
@J.F.Sebastian: [the same answer](http://stackoverflow.com/questions/9245638/select-random-lines-from-a-file-in-bash/9245733#comment40766527_9245733) – Rubens Sep 24 '14 at 21:13
-
@ThomasAhle, the Debian package summary for `r1`'s randomize-lines states *Users are recommended to use the shuf command instead which should be available by default. This package may be considered deprecated.* Therefore, `shuf` appears preferable. – Adam Katz Dec 17 '14 at 21:50
-
4Note that `shuf` is part of [GNU Coreutils](https://en.wikipedia.org/wiki/GNU_Core_Utilities) and therefore won't necessarily be available (by default) on *BSD systems (or Mac?). @Tracker1's perl one-liner below is more portable (and by my tests, is slightly faster). – Adam Katz Dec 19 '14 at 21:49
-
-
-
This is a cool command! Yet another wheel I've reinvented not knowing it already exists in my flavor of Unix! Thank you! – Sol Jul 08 '16 at 14:23
-
though this is not suitable for huge files... I'm getting a 'shuf: read error: Cannot allocate memory' on a 70GB file – jimijazz Oct 07 '16 at 00:30
-
This is a great answer. I would just like to point out that in case more than 1 line is needed, ``shuf`` and ``rl`` make *permutations* of lines, not random draws. I.e. if you want to draw k random lines, you will want to run ``shuf -n 1`` k times. This will draw from N^k possibilities instead of N!/(N-k)! possibilities, where N is the total number of lines. E.g., get 7 random lines from wordlist.txt: ``for n in {1..7}; do shuf -n1 wordlist.txt; done`` – sujeet Mar 09 '17 at 04:19
-
you can use process substitution if you don't want to give `shuf` a file: `shuf -n 1 <(echo -e "heads\ntails")` will randomly pick "heads" or "tails". Or just pipe to it: `echo -e "heads\ntails" | shuf -n 1` – pmarreck Oct 10 '22 at 18:23
sort --random-sort $FILE | head -n 1
(I like the shuf approach above even better though - I didn't even know that existed and I would have never found that tool on my own)

- 1,581
- 1
- 19
- 29

- 36,043
- 14
- 56
- 60
-
10+1 I like it, but you may need a very recent `sort`, didn't work on any of my systems (CentOS 5.5, Mac OS 10.7.2). Also, useless use of cat, could be reduced to `sort --random-sort < $FILE | head -n 1` – Steve Kehlet Feb 16 '12 at 19:02
-
`sort -R <<< $'1\n1\n2' | head -1` is as likely to return 1 and 2, because `sort -R` sorts duplicate lines together. The same applies to `sort -Ru`, because it removes duplicate lines. – Lri Sep 15 '12 at 11:03
-
5This is relatively slow, since the whole file needs to get shuffled by `sort` before piping it to `head`. `shuf` selects random lines from the file, instead and is much faster for me. – Bengt Nov 25 '12 at 17:33
-
1@SteveKehlet while we're at it, `sort --random-sort $FILE | head` would be best, as it allows it to access the file directly, possibly enabling efficient parallel sorting – WaelJ Jun 06 '14 at 18:22
-
-
5The `--random-sort` and `-R` options are specific to GNU sort (so they won't work with BSD or Mac OS `sort`). GNU sort learned those flags in 2005 so you need GNU coreutils 6.0 or newer (eg CentOS 6). – RJHunter Apr 09 '15 at 07:09
-
from Wikipedia: "this is not a full random shuffle because it will sort identical lines together" – janosdivenyi Apr 14 '15 at 10:58
-
@Bengt: nothing is written until `shuf` reads the whole file into memory. `sort` may work even if the file does not fit in memory. – jfs Sep 26 '15 at 00:59
Another alternative:
head -$((${RANDOM} % `wc -l < file` + 1)) file | tail -1

- 5,152
- 21
- 22
-
28${RANDOM} only generates numbers less than 32768, so don't use this for large files (for example the English dictionary). – Ralf Mar 13 '12 at 20:16
-
3This does not give you the precise same probability for every line, due to the modulo operation. This does barely matter if the file length is << 32768 (and not at all if it divides that number), but maybe worth noting. – Anaphory Mar 21 '14 at 17:58
-
11You can extend this to 30-bit random numbers by using `(${RANDOM} << 15) + ${RANDOM}`. This significantly reduces the bias and allows it to work for files containing up to 1 billion lines. – nneonneo Jun 19 '15 at 05:42
-
@nneonneo: Very cool trick, though according to this link it should be OR'ing the ${RANDOM}'s instead of PLUS'ing http://stackoverflow.com/a/19602060/293064 – Jay Taylor Jul 12 '15 at 01:54
-
`+` and `|` are the same since `${RANDOM}` is 0..32767 by definition. – nneonneo Jul 12 '15 at 07:12
-
There's a heavy performance penalty to this, since it needs to count lines to be sure it's reading to the right point. – Charles Duffy Mar 19 '18 at 22:35
This is simple.
cat file.txt | shuf -n 1
Granted this is just a tad slower than the "shuf -n 1 file.txt" on its own.

- 39,467
- 16
- 112
- 140

- 1,170
- 13
- 17
-
2Best answer. I didn't know about this command. Note that `-n 1` specifies 1 line, and you can change it to more than 1. `shuf` can be used for other things too; I just piped `ps aux` and `grep` with it to randomly kill processes partially matching a name. – sudo Jan 18 '17 at 22:53
perlfaq5: How do I select a random line from a file? Here's a reservoir-sampling algorithm from the Camel Book:
perl -e 'srand; rand($.) < 1 && ($line = $_) while <>; print $line;' file
This has a significant advantage in space over reading the whole file in. You can find a proof of this method in The Art of Computer Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.

- 39,467
- 16
- 112
- 140

- 19,103
- 12
- 80
- 106
-
1Just for the purposes of inclusion (in case the referred site goes down), here's the code that Tracker1 pointed to: "cat filename | perl -e 'while (<>) { push(@_,$_); } print @_[rand()*@_];';" – Anirvan Jan 15 '09 at 19:16
-
3This is a useless use of cat. Here's a slight modification of the code found in perlfaq5 (and courtesy of the Camel book): perl -e 'srand; rand($.) < 1 && ($line = $_) while <>; print $line;' filename – Mr. Muskrat Jan 15 '09 at 21:55
-
-
I just benchmarked an N-lines version of this code against `shuf`. The perl code is very slightly faster (8% faster by user time, 24% faster by system time), though anecdotally I've found the perl code "seems" less random (I wrote a jukebox using it). – Adam Katz Dec 17 '14 at 21:59
-
2More food for thought: [`shuf` stores the whole input file in memory](https://stackoverflow.com/questions/9245638/select-random-lines-from-a-file-in-bash/9245733#comment40771587_9245733), which is a horrible idea, while this code only stores one line, so the limit of this code is a line count of INT_MAX (2^31 or 2^63 depending on your arch), assuming any of its selected potential lines fits in memory. – Adam Katz Dec 19 '14 at 21:58
-
here's the awk equivalent. either of these answers (perl or awk) are better than the accepted for - portability, speed, and ability to manage huge files easily. `awk 'BEGIN{srand()}{rand()*NR<1&&l=$0}END{print l}' file` or `some_input | awk 'BEGIN{srand()}{rand()*NR<1&&l=$0}END{print l}'` – keithpjolley Apr 19 '20 at 17:18
using a bash script:
#!/bin/bash
# replace with file to read
FILE=tmp.txt
# count number of lines
NUM=$(wc - l < ${FILE})
# generate random number in range 0-NUM
let X=${RANDOM} % ${NUM} + 1
# extract X-th line
sed -n ${X}p ${FILE}

- 55,237
- 33
- 144
- 193
-
1Random can be 0, sed needs 1 for the first line. sed -n 0p returns error. – asalamon74 Jan 15 '09 at 19:20
-
-
but even with the bug worth a point, as it does not need perl or python and is as efficient as you can get (reading the file exactly twice but not into memory - so it would work even with huge files). – blabla999 Jan 15 '09 at 19:28
-
@asalamon74: thanks @blabla999: if we make a function out of it, ok for $1, but why not computing NUM? – Paolo Tedesco Jan 15 '09 at 19:28
-
-
-
@Hasturkun: beware - the output of wc depends on whether it reads stdin or a file name off its command line. Granted, 'wc -l < $FILE' would be OK; using 'wc -l $FILE' (no redirection) would be a bug. – Jonathan Leffler Jan 16 '09 at 08:06
-
@Hasturkun & J.Leffler: the cat was meant to avoid wc printing the file name. Fixed with the 'wc -l < $FILE' suggestion, thanks – Paolo Tedesco Jan 16 '09 at 08:26
-
The variable names should be quoted, especially `$FILE`. The curly braces are superfluous here. I recommend using lowercase or mixed-case variable names to avoid potential name collisions with shell or environment variables. – Dennis Williamson Oct 28 '11 at 14:22
-
If a file has 32769 or more lines, the last ones are never selected. `wc - l` shouldn't have a space. – Lri Sep 15 '12 at 11:12
Single bash line:
sed -n $((1+$RANDOM%`wc -l test.txt | cut -f 1 -d ' '`))p test.txt
Slight problem: duplicate filename.

- 6,120
- 9
- 46
- 60
-
3slighter problem. performing this on /usr/share/dict/words tends to favor words starting with "A". Playing with it, I'm at about 90% "A" words to 10% "B" words. None starting with numbers yet, which make up the head of the file. – bibby Sep 30 '10 at 05:01
-
Here's a simple Python script that will do the job:
import random, sys
lines = open(sys.argv[1]).readlines()
print(lines[random.randrange(len(lines))])
Usage:
python randline.py file_to_get_random_line_from

- 399,953
- 195
- 994
- 1,670

- 390,455
- 97
- 512
- 589
-
1This doesn't quite work. It stops after a single line. To make it work, I did this: `import random, sys lines = open(sys.argv[1]).readlines() ` for i in range(len(lines)): rand = random.randint(0, len(lines)-1) print lines.pop(rand), – Jed Daniels Jan 14 '11 at 20:13
-
Stupid comment system with crappy formatting. Didn't formatting in comments work once upon a time? – Jed Daniels Jan 14 '11 at 20:14
-
randint is inclusive therefore `len(lines)` may lead to IndexError. You could use `print(random.choice(list(open(sys.argv[1]))))`. There is also memory efficient [reservoir sampling algorithm](http://askubuntu.com/a/527778/3712). – jfs Sep 24 '14 at 19:08
-
2
-
@MichaelCampbell: [reservoir sampling algorithm](http://stackoverflow.com/a/32792504/4279) that I've mentioned above may work with 3TB file (if line size is limited). – jfs Sep 26 '15 at 01:02
-
Using [py](https://github.com/Russell91/pythonpy) is nice. `-l` assigns incoming lines to a list, `l`. `py` auto-imports stdlib modules. so you can do `cat $FILE | py -l "random.choice(l)"`. Try it: `python -m this | py -l "random.choice(l)"` ... erm actually just `py this | py -l "random.choice(l)"` ;) – floer32 Jan 05 '16 at 21:23
Another way using 'awk'
awk NR==$((${RANDOM} % `wc -l < file.name` + 1)) file.name

- 1,439
- 12
- 17
-
2That uses awk and bash (`$RANDOM` is a [bashism](https://en.wikipedia.org/wiki/Bashism)). Here is a pure awk (mawk) method using the same logic as @Tracker1's cited perlfaq5 code above: `awk 'rand() * NR < 1 { line = $0 } END { print line }' file.name` (wow, it's even *shorter* than the perl code!) – Adam Katz Dec 19 '14 at 21:33
-
That code must read the file (`wc`) in order to get a line count, then must read (part of) the file again (`awk`) to get the content of the given random line number. I/O will be far more expensive than getting a random number. My code reads the file once only. The issue with awk's `rand()` is that it seeds based on seconds, so you'll get duplicates if you run it consecutively too fast. – Adam Katz Dec 19 '14 at 21:41
A solution that also works on MacOSX, and should also works on Linux(?):
N=5
awk 'NR==FNR {lineN[$1]; next}(FNR in lineN)' <(jot -r $N 1 $(wc -l < $file)) $file
Where:
N
is the number of random lines you wantNR==FNR {lineN[$1]; next}(FNR in lineN) file1 file2
--> save line numbers written infile1
and then print corresponding line infile2
jot -r $N 1 $(wc -l < $file)
--> drawN
numbers randomly (-r
) in range(1, number_of_line_in_file)
withjot
. The process substitution<()
will make it look like a file for the interpreter, sofile1
in previous example.

- 21,103
- 9
- 64
- 78
Using only vanilla sed and awk, and without using $RANDOM, a simple, space-efficient and reasonably fast "one-liner" for selecting a single line pseudo-randomly from a file named FILENAME is as follows:
sed -n $(awk 'END {srand(); r=rand()*NR; if (r<NR) {sub(/\..*/,"",r); r++;}; print r}' FILENAME)p FILENAME
(This works even if FILENAME is empty, in which case no line is emitted.)
One possible advantage of this approach is that it only calls rand() once.
As pointed out by @AdamKatz in the comments, another possibility would be to call rand() for each line:
awk 'rand() * NR < 1 { line = $0 } END { print line }' FILENAME
(A simple proof of correctness can be given based on induction.)
Caveat about rand()
"In most awk implementations, including gawk, rand() starts generating numbers from the same starting number, or seed, each time you run awk."
-- https://www.gnu.org/software/gawk/manual/html_node/Numeric-Functions.html

- 105,803
- 17
- 152
- 177
-
See [the comment I posted a year before this answer](https://stackoverflow.com/questions/448005/whats-an-easy-way-to-read-random-line-from-a-file-in-unix-command-line#comment43573811_18607080), which has a simpler awk solution that doesn't require sed. Also note my caveat about awk's random number generator, which seeds at whole seconds. – Adam Katz Mar 19 '18 at 18:40
#!/bin/bash
IFS=$'\n' wordsArray=($(<$1))
numWords=${#wordsArray[@]}
sizeOfNumWords=${#numWords}
while [ True ]
do
for ((i=0; i<$sizeOfNumWords; i++))
do
let ranNumArray[$i]=$(( ( $RANDOM % 10 ) + 1 ))-1
ranNumStr="$ranNumStr${ranNumArray[$i]}"
done
if [ $ranNumStr -le $numWords ]
then
break
fi
ranNumStr=""
done
noLeadZeroStr=$((10#$ranNumStr))
echo ${wordsArray[$noLeadZeroStr]}

- 915
- 1
- 10
- 15
-
Since $RANDOM generates numbers less than the number of words in /usr/share/dict/words, which has 235886 (on my Mac anyway), I just generate 6 separate random numbers between 0 and 9 and string them together. Then I make sure that number is less than 235886. Then remove leading zeros to index the words that I stored in the array. Since each word is its own line this could easily be used for any file to randomly pick a line. – Ken Roy Jun 15 '17 at 13:01
Here is what I discovery since my Mac OS doesn't use all the easy answers. I used the jot command to generate a number since the $RANDOM variable solutions seems not to be very random in my test. When testing my solution I had a wide variance in the solutions provided in the output.
RANDOM1=`jot -r 1 1 235886`
#range of jot ( 1 235886 ) found from earlier wc -w /usr/share/dict/web2
echo $RANDOM1
head -n $RANDOM1 /usr/share/dict/web2 | tail -n 1
The echo of the variable is to get a visual of the generated random number.

- 1
- 1