Get a list of lines from a file

Question

I have a huge file (millions of lines). I want to get a random sample from it, I've generated a list of unique random numbers and now I want to get all the lines whose line number would match my random numbers generated.

Sorting the random numbers is not a problem, so I was thinking I can take the difference between consecutive numbers and just jump the difference with the cursor in the file.

I think I should use sed or awk.

possible duplicate of [What's an easy way to read random line from a file in Unix command line?](http://stackoverflow.com/questions/448005/whats-an-easy-way-to-read-random-line-from-a-file-in-unix-command-line) — tripleee, Mar 13 '14 at 17:17

score 4 · Accepted Answer · edited May 23 '17 at 12:13

4

Why don't you directly use shuf to get random lines:

shuf -n NUMBER_OF_LINES file

Example

$ seq 100 >a   # the file "a" contains number 1 to 100, each one in a line

$ shuf -n 4 a
54
46
30
53

$ shuf -n 4 a
50
37
63
21

Update

Can I somehow store the number of lines shuf chose? – Pio

As I did in How to efficiently get 10% of random lines out of the large file in Linux?, you can do something like this:

shuf -i 1-1000 -n 5 > rand_numbers # store the list of numbers
awk 'FNR==NR {a[$1]; next} {if (FNR in a) print}' list_of_numbers a #print those lines

edited May 23 '17 at 12:13

Community

1
1

answered Mar 13 '14 at 17:09

fedorqui

275,237
103
548
598

Wow... I did not know of this :). Can I somehow store the number of lines shuf chose? – Pio Mar 13 '14 at 17:10
In a bash script if I store the `wc -l filename > max`, then how do I get only the first element of `wc -l` not the filename as well? – Pio Mar 13 '14 at 17:44

score 0 · Answer 2 · answered Mar 13 '14 at 17:19

0

You can use awk and shuf:

shuf file.txt > shuf.txt
awk '!a[$0]++' shuf.txt > uniqed.txt

This awk is best tool for removing duplicates.

answered Mar 13 '14 at 17:19

MLSC

5,872
8
55
89

Get a list of lines from a file

2 Answers2

Example

Update