Random element from an array bigger than 32767 in bash

Question

Having:

mapfile -t words < <( head -10000 /usr/share/dict/words)
echo "${#words[@]}" #10000
r=$(( $RANDOM % ${#words[@]} ))
echo "$r ${words[$r]}"

This select a random word from the array of 10k words.

But having the array bigger as 32767 (e.g. whole file 200k+ words), it stops work because the $RANDOM is only up to 32767. From man bash:

Each time this parameter is referenced, a random integer between 0 and 32767 is generated.

mapfile -t words < /usr/share/dict/words
echo "${#words[@]}" # 235886
r=$(( $RANDOM % ${#words[@]} )) #how to change this?
echo "$r ${words[$r]}"

Don't want use some perl like perl -plE 's/.*/int(rand()*$_)/e', not every system have perl installed. Looking for the simplest possible solution - and also don't care about the true randomness - it isn't for cryptography. :)

Fred · Answer 1 · 2017-04-07T10:11:53.030

If shuf is available on your system...

r=$(shuf -i 0-${#words[@]} -n 1)

If not, you could use $RANDOM several times and concatenate the results to obtain a number with enough digits to cover your needs. You should concatenate, not add, as adding random numbers will not produce an even distribution (just like throwing two random dies will produce a total of 7 more often than a total of 1).

For instance :

printf -v r1 %05d $RANDOM
printf -v r2 %05d $RANDOM
printf -v r3 %05d $RANDOM
r4=${r1:1}${r2:1}${r3:1}
r=$(( $r4 % ${#words[@]} ))

The printf statements are used to make sure leading zeros are kept ; the -v option is a hidden gem that allows a variable to be assigned the value (which can, among other things, allow the use of eval to be avoided in many useful real-life cases). The first digit in each of r1, r2 and r3 is stripped because it can only be 0, 1, 2 or 3.

Works on Linux - but the `shuf` unfortunately isn't available on MacOSX by default. :( — cajwine, Apr 07 '17 at 09:53

score 1 · Accepted Answer · edited May 23 '17 at 12:02

1

One possible solution is to do some maths with the outcome of $RANDOM:

big_random=`expr $RANDOM \* 32767 + $RANDOM`

Another is to use $RANDOM once to pick a block of the input file, then $RANDOM again to pick a line from within that block.

Note that $RANDOM doesn't allow you to specify a range. % gives a non-uniform result. Further discussion at: How to generate random number in Bash?

As an aside, it doesn't seem particularly wise to read the whole of words into memory. Unless you'll be doing a lot of repeat access to this data structure, consider trying to do this without slurping up the whole file at once.

edited May 23 '17 at 12:02

Community

1
1

answered Apr 07 '17 at 09:55

slim

40,215
13
94
127

By you is better to use like `sed -n "${num}p" file`? e.g. run external program which reads the file anyway? the mapfile is builtin, and i could simply clear the array... or missed me something? – cajwine Apr 07 '17 at 10:00
I would use `sed` as you've noted. `mapfile` will consume at least as much process memory as the size of the file (albeit temporarily). `sed` will consume one line's-worth at a time. Whether you're OK with that is your decision. – slim Apr 07 '17 at 10:04
... and if you were bothered by running external programs, you wouldn't be programming in Bash, right? – slim Apr 07 '17 at 10:05
:) i'm not bothered, just first time I need run `wc -l` to get the number of words, and second time need run the `sed` - therefore i used the mapfile. But, will do some tests - and will see. Thanx for the idea. :) – cajwine Apr 07 '17 at 10:11
Why call expr if the shell is perfectly capable of doing this math: `big_random=$((32768*RANDOM+RANDOM))`. And yes, the multiplier should be 32768 (not 32767). Or, if you want to avoid the multiplication, you can use the quite faster shift: `big_random=$(((RANDOM<<15)+RANDOM))` – Apr 08 '17 at 02:43

score 0 · Answer 3 · answered Jan 08 '20 at 17:39

The accepted answer will get you ten digits, but for each five-digit prefix, the last five digits may only be in the range 00000-32767.

The number 1234567890, for example, is not a possibility because 67890 > 32767.

That may be fine. Personally I find this option a bit nicer. It gives you numbers 0-1073676289 with no gaps.

big_random=$(expr $RANDOM \* $RANDOM)

Random element from an array bigger than 32767 in bash

3 Answers3