3

I have read this post Select random lines from a file in bash and Random selection of columns using linux command however they don't work specifically with a set of lines that need to stay in the same order. I also searched to find if there was any randomization option using the cut command.

My attempt:

I am trying to replace spaces with new lines, then sort Randomly and then use Head to grab a random string (for each line).

cat file1.txt | while read line; do echo $line | sed 's/ /\n/g' | sort -R | head -1

While this does get the basic job done for one random string, I would like to know if there is a better more efficient way of writing this code? This way, I can add the options to get 1-2 random strings rather than just one.

Here's file1.txt:

#Sample #Example #StackOverflow #Question
#Easy #Simple #Code #Examples #Help
#Support #Really #Helps #Everyone #Learn

Here's my desired output (random values):

#Question
#Code #Examples
#Helps

If you know a better way to implement this code, I would really appreciate your positive input and support.

DomainsFeatured
  • 1,426
  • 1
  • 21
  • 39

4 Answers4

3

This is the solution

while read -r line; do echo "$line" | grep -oP '(\S+)' | shuf -n $((RANDOM%2+1)) | paste -s -d' '; done < file1.txt
Darby_Crash
  • 446
  • 3
  • 6
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/157303/discussion-between-domainsfeatured-and-darby-crash). – DomainsFeatured Oct 23 '17 at 18:35
  • Great solution. Exactly what I was looking for. A solution to randomly choose 1-2 strings from each line of a file. It's certainly 100x better than my solution. Thanks Darby! – DomainsFeatured Oct 23 '17 at 19:43
  • It's a pleasure ;) – Darby_Crash Oct 23 '17 at 19:49
  • Hmm, why don't you just take N first words off the shuffled line, like that: ``echo `echo "$line" | grep -oP '(\S+)' | shuf -n $((RANDOM%2+1))` `` ? AFAIK all the subsequent code does not really add anything to what `shuf` already does. – zeppelin Oct 23 '17 at 21:28
2

Using AWK:

%awk 'BEGIN { srand() } { print $(1+int(rand()*NF))}' data.txt

#Question
#Help
#Support

You can modify this to select 2 (or more) random words per line (with duplicates), by repeating the $(rand...) construct, a corresponding number of times (or defining a user function to do this).

Choosing N words from an each line w/o duplicates (by position), is a bit more tricky:

awk '
BEGIN { N=2; srand() } 
{ 
    #Collect fields into an array (w)
    delete w;
    for(i=1;i<=NF;i++) w[i]=$i; 

    #Randomize Array (Fisher–Yates style)
    for(j=NF;j>=2;j--) { 
       r=1+int(rand()*(j));
       if(r!=j) { 
          x=w[j]; w[j]=w[r]; w[r]=x; 
       } 
    }

    #Take N first items off the randomized array 
    for(g=1;g<=(N<NF?N:NF);g++) {
       if(g>1) printf " "
       printf w[g];       
    }   
    printf "\n"
}' data.txt

N - is a (maximum) number of words to pick per line.

To pick a random (at most N) number of items per line, modify the code like that:

awk '
BEGIN { N=2; srand() } 
{ 
    #Collect fields into an array (w)
    delete w;
    for(i=1;i<=NF;i++) w[i]=$i; 

    #Randomize Array (Fisher–Yates style)
    for(j=NF;j>=2;j--) { 
       r=1+int(rand()*(j));
       if(r!=j) { 
          x=w[j]; w[j]=w[r]; w[r]=x; 
       } 
    }

    #Take L < N first items off the randomized array 
    L=1+int(rand()*N);
    for(g=1;g<=(L<NF?L:NF);g++) {
       if(g>1) printf " "
       printf w[g];       
    }   
    printf "\n"
}' data.txt

This will print 1 or 2 (N) randomly chosen words per each line.

This code can still be optimized a bit (i.e. by only shuffling first L elements of an array), yet it is 2 or 3 orders of magnitude faster than a shell based solution.

zeppelin
  • 8,947
  • 2
  • 24
  • 30
  • This is a nice answer. Do you know how I could randomize it to return 1-2 strings rather than just 1 string? – DomainsFeatured Oct 23 '17 at 17:38
  • @Inian, it's okay, but it requires me to change the number manually. You can leave it if you would like. I'm sure it would help someone looking for a similar solution. – DomainsFeatured Oct 23 '17 at 18:33
0

An attempt on bash

cat file1  | xargs -n1  -I@ bash -c "output_count=2; \
   line=\$(echo \"@\"); \
   words=\$(echo  \${line} | wc -w); \
   for i in  \$(eval echo \"{1..\${output_count}}\"); do \
      select=\$((1 + RANDOM % \${words})); \
      echo  \${line} | cut -d \" \" -f \${select} | tr '\n' ' '; \
   done;
   echo \" \" "

Assumes that the file is called file1. In order to change the number of randomly selected words, set a different number to output_count

Prints

$ cat file1  | xargs -n1  -I@ bash -c "output_count=2; \
   line=\$(echo \"@\"); \
   words=\$(echo  \${line} | wc -w); \
   for i in  \$(eval echo \"{1..\${output_count}}\"); do \
      select=\$((1 + RANDOM % \${words})); \
      echo  \${line} | cut -d \" \" -f \${select} | tr '\n' ' '; \
   done;
   echo \" \" "
#Example #Example
#Examples #Help
#Support #Learn
$ cat file1  | xargs -n1  -I@ bash -c "output_count=2; \
   line=\$(echo \"@\"); \
   words=\$(echo  \${line} | wc -w); \
   for i in  \$(eval echo \"{1..\${output_count}}\"); do \
      select=\$((1 + RANDOM % \${words})); \
      echo  \${line} | cut -d \" \" -f \${select} | tr '\n' ' '; \
   done;
   echo \" \" "
#Question #StackOverflow
#Help #Help
#Everyone #Learn
Hakan Baba
  • 1,897
  • 4
  • 21
  • 37
  • @DomainsFeatured if you do not want to print the same word twice, this obviously does not work. There are ways to achieve that easily. For example keep the words in an array and swap the randomly selected word with the last word and shorten the length of the array at every iteration. Let me know if you need that. – Hakan Baba Oct 23 '17 at 18:10
0

This might work for you (GNU sed):

sed 'y/ /\n/;s/.*/echo "&"|shuf -n$((RANDOM%2+1))/e;y/\n/ /' file

Replace the spaces in each line by newlines and the using seds substitution eflag, pass each set of lines into the shuf -n command .

potong
  • 55,640
  • 6
  • 51
  • 83