4

I would like to shuffle output of find BUT with a fixed seed, so that every time I run the command I get the same output.

Here's how I shuffle:

find . -name '*.wav' | shuf

The issue is that every time we have a new seed -> new order. My attempt to fix it:

find . -name '*.wav' | shuf --random-source=<(echo 42)

That works only on occasions (i.e. just a few cases, in a deterministic way). Most of the time it fails with:

shuf: ‘/proc/self/fd/12’: end of file

Same error is produced by e.g.

seq 1 100 | sort -R --random-source=<(echo 42)

That I have seen being used in other places.

This works though:

printf '%s\n' a b c | shuf --random-source=<(echo 42)

Why is that? And how I can fix it? I am open to any suggestions. Output of the find is a part of a larger script. The solution should work for bash and zsh.

Why my solution did not work (EDIT)

Thanks to @franzisk and @Inian I think I understand now why my initial solution did not work. I was looking at --random-source as it were a seek, while it is, well, "random source" = source of randomness. My echo 42 simply does not provide enough entropy for anything longer than a few lines. That's why it worked only in a couple of cases!

"Seeding" (as in: sending) large number of bytes does the job as it provides enough entropy.

Lukasz Tracewski
  • 10,794
  • 3
  • 34
  • 53
  • Does this post help - [Shuffling lines of a file with a fixed seed?](https://stackoverflow.com/q/5914513/5291015) – Inian Feb 17 '20 at 16:20
  • Thanks @Inian - it does. Initially I thought these are different problems, but now I begin to see why these are really similar. I did not understand how the random numbers are generated; this helped me a great deal: https://www.gnu.org/software/coreutils/manual/html_node/Random-sources.html#Random-sources (doc linked in the answer you gave). – Lukasz Tracewski Feb 17 '20 at 18:30

1 Answers1

2

You can create your fixed_random function, using openssl to generate your random-source flow, like this

get_fixed_random()
{
  openssl enc -aes-256-ctr -pass pass:"$1" -nosalt </dev/zero 2>/dev/null
}

Load the function into your environment

. /file-containing/get_fixed_random

Launch the find command, pipe the output to shuf using the random function to feed the --random-source option

find . -name '*.wav'  | shuf --random-source=<(get_fixed_random 55)

NB: 55 is just the seed parameter passed. Change it to change the random result

Francesco Gasparetto
  • 1,819
  • 16
  • 20
  • Thanks! That answers how to fix it, question now why my approach worked only sometimes? I think I know now. It's all about how the random numbers are generated. There's no "seed", it's really "source of random numbers". If there's not enough entropy on the input, it won't work. Your solution works since it provides enough of information. – Lukasz Tracewski Feb 17 '20 at 18:34
  • Yes, the openssl command is used to generate a continuous random bytes stream. Glad to help you! – Francesco Gasparetto Feb 18 '20 at 08:45