A script for generating a mapping between two sets of words randomly

Question

E.g. the inputs are two sets A and B.

set A is stored in file a.txt as below:

apple
orange
grape
...

set B is stored in file b.txt as below:

tomato
potato
cucumber
...

The output is c.txt like:

apple    potato
orange   tomato
grape    celery
...      ...

Note the mapping between them is randomly generated. I.e., each time

map.sh a.txt b.txt > c.txt

usually give a different mapping.

Can this be implemented in shell (or awk, sed)?

score 3 · Accepted Answer · answered Apr 16 '14 at 22:59

3

paste <(shuf a.txt) <(shuf b.txt)

If you would like the first column to stay constant, you can simply supply a.txt as the first argument to paste:

paste a.txt <(shuf b.txt)

answered Apr 16 '14 at 22:59

works perfectly on linux. Thanks, Tim. Any alternatives for mac environment? – JackWM Apr 16 '14 at 23:03
@JackWM: AFAIK, OS X should have the `paste` utility installed with the system. – Apr 16 '14 at 23:04
Yes. It has paste, but no `shuf` – JackWM Apr 16 '14 at 23:05
@JackWM: You could try replacing `shuf ` with `sort -R `. – Apr 16 '14 at 23:06
It seems it doesn't support `sort -R` either. But I found a solution here http://stackoverflow.com/questions/2153882/how-can-i-shuffle-the-lines-of-a-text-file-in-unix-command-line – JackWM Apr 16 '14 at 23:10
+1, no chance to use the command shuf, here is a good sample. – BMW Apr 17 '14 at 04:26

score 0 · Answer 2 · answered Apr 17 '14 at 05:29

If you did want to do this in Awk, you could use rand(). Just make sure you set a new random seed (srand()) each time:

$ awk ' BEGIN { srand() }
    NR == FNR {
        a[rand(), NR] = $1; 
        next;
    } 
    1 == FNR { asorti(a, v) } 
    {
        i = length(v); 
        j = v[i];
        delete v[i]; 
        print $1, a[j];
    }
' a.txt b.txt

tomato orange
potato apple
cucumber grape

$ awk ' BEGIN { srand() }
    NR == FNR {
        a[rand(), NR] = $1; 
        next;
    } 
    1 == FNR { asorti(a, v) } 
    {
        i = length(v); 
        j = v[i];
        delete v[i]; 
        print $1, a[j];
    }
' a.txt b.txt

tomato apple
potato orange
cucumber grape

A script for generating a mapping between two sets of words randomly

2 Answers2