23

I want to shuffle the lines of a file with a fixed seed so that I always get the same random order. The command I am using is as follows:

sort -R file.txt | head -200 > file.sff

What change could I make it so that it sorts with a fixed random seed?

codeforester
  • 39,467
  • 16
  • 112
  • 140
Flethuseo
  • 5,969
  • 11
  • 47
  • 71

4 Answers4

25

The GNU implementation of sort has a --random-source argument. Passing this argument with the name of a file with known contents will result in a reliable set of output.

See the Random sources documentation in the GNU coreutils manual, which contains the following sample implementation and example:

get_seeded_random()
{
  seed="$1"
  openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt \
    </dev/zero 2>/dev/null
}

shuf -i1-100 --random-source=<(get_seeded_random 42)

Since GNU sort is also part of coreutils, the relevant documentation applies there as well:

sort --random-source=<(get_seeded_random 42) -R file.txt | head -200 > file.sff
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Since `sort` and `shuf` are both part of GNU coreutils, does it matter in the slightest? If one has one, they'll have both. – Charles Duffy Jan 31 '17 at 16:36
  • 1
    If using HomeBrew on macOS, `shuf` is `gshuf` (part of `coreutils` package). – mommi84 Mar 08 '18 at 10:35
  • 2
    `--random-source` is a good start, but note that it needs a larger number of bytes; otherwuse an `end of file ` error is thrown; see https://stackoverflow.com/q/60266215/1506477 – Thamme Gowda May 22 '20 at 05:52
  • @ThammeGowda, I'm having trouble seeing the difference from how the accepted answer to that question is advising folks to use `openssl enc` to generate a random stream and how this answer advises one to do so. – Charles Duffy Oct 11 '22 at 20:14
8

Linux's shuf command can take a file as a fixed source of randomness using the parameter --random-source:

shuf --random-source=some_file.txt file.txt | head -n200 > file.sff

If you don't want to bother with giving a full file, you can pipe one on the go:

shuf --random-source=<(yes 42) file.txt | head -n200 > file.sff
baraaorabi
  • 161
  • 3
  • 5
2

You may not need to use external tools like sort, whose options and usage may vary depending on your operating system. Bash has an internal random number generator accessible through the $RANDOM variable. It's common practice to seed the generator by setting the variable, like so:

RANDOM=$$

or

RANDOM=$(date '+%s')

etc. But of course, you can also use a predictable seed in order to get predictable not-so-random results:

$ RANDOM=12345; echo $RANDOM
28207
$ RANDOM=12345; echo $RANDOM
28207

To reorder the lines of the mapped file randomly, you can read the file into an array using mapfile:

$ mapfile -t a < source.txt

Then simply rewrite the array indices:

$ for i in ${!a[@]}; do a[$((RANDOM+${#a[@]}))]="${a[$i]}"; unset a[$i]; done

When reading a non-associative array, bash naturally orders elements in ascending order of index value.

Note that the new index for each line has the number of array elements added to it to avoid collisions within that range. This solution is still fallible -- there's no guarantee that $RANDOM will produce unique numbers. You can mitigate that risk with extra code that checks for prior use of each index, or reduce the risk with bit-shifting:

... a[$(( (RANDOM<<15)+RANDOM+${#a[@]} ))]= ...

This makes your index values into a 30-bit unsigned int instead of a 15 bit unsigned int.

ghoti
  • 45,319
  • 8
  • 65
  • 104
-6

If you're randomly shuffling lines, you're not sorting. I haven't seen a sort with --random-source prompt before. It'd be interesting if it does exist. However, that's not sorting the lines in a fixed order.

I believe you'll have to write a program to that, and I don't think Bash can quite do it.

Actually, it might. The $RANDOM environment variable selects a random number from 0 to 32767. You can assign a seed to RANDOM and the random number sequence will appear over and over. You can use a card dealing algorithm. Read in each line into a Bash array, then use the card dealing algorithm to pick each line.

I'm not going to write a test program -- especially in Bash, but you should get the idea.

David W.
  • 105,218
  • 39
  • 216
  • 337