0

The command line below works for me

perl -F'\t' -lane'print join ",", @F[1,2]' inputfile

BUT I want to pass a variable list of columns, not necessarily columns 1 and 2 as specified in @F[1,2].

For example, based on the total number of columns of the inputfile, I would like to select a random subset "$random-columns" and pass it to @F[$random-columns].

How do I do that?

I tried to first generate a columnList of 5 random column numbers between 1 and 50:

columnList=()
for (( i = 0; i <= 5-1; ++i ))
do
    (( randCol = ($RANDOM % 50) + 1 ))
    columnList[i]=$randCol
done

Then I did the following to insert the comma:

cols_new=$(IFS=,; echo "${columnList[*]}")

and tried to pass it to the perl command line as below (didn't work):

perl -F'\t' -lane'print join ",", @F[$cols_new]' inputfile
Evan M
  • 2,573
  • 1
  • 31
  • 36
  • Duplicate of [How can I process options using Perl in -n or -p mode?](https://stackoverflow.com/q/53524699/589924)? – ikegami Dec 28 '18 at 04:16

4 Answers4

1

Use rand.

Five random number from 0 to 50:

@randoms = map {int(rand(50))} 1..5;

In your one-liner:

perl -F'\t' -lane 'print join ",", @F[map {int(rand(50))} 1..5]' inputfile

To use the same random column indexes for each line, use a BEGIN block that only executes once at the start of the program:

perl -F'\t' -lane 'BEGIN {@rand = map {int(rand(50))} 1..5]}; print join ",", @F[@rand]' inputfile
beasy
  • 1,227
  • 8
  • 16
  • 2
    that will get a different set of random columns for each input line, a very different thing – ysth Dec 27 '18 at 22:32
  • 1
    `int(rand(50))` creates a random integer from 0 to 49, not 0 to 50. Also, `int()` is not needed here since the numbers are for array indices and Perl automatically "ints" those. – Kjetil S. Dec 27 '18 at 22:41
  • @ysth. If we need the same random columns for each line, just: `perl -F'\t' -lane '@i=map rand(1+$#F), 1..5 if not @i; print join ",", @F[@i]' inputfile`. Here `1+$#F` is used instead of 50 since that allows for any number of fields the file happens to have on its first line. – Kjetil S. Dec 27 '18 at 22:44
  • 1
    @KjetilS. `1+$#F` better written as `@F`. – melpomene Dec 27 '18 at 22:45
  • @melpomene ← Agree, but I think `0+@F` is even more readable. (And allowing for some future rand() to have more than one parameter, althou that's very unlikely) – Kjetil S. Dec 27 '18 at 22:49
  • @KjetilS. That would change the operator precedence of `rand`, so it'll never happen. – melpomene Dec 27 '18 at 22:58
  • @ysth True. Edited to set the randos in a `BEGIN` block. – beasy Dec 28 '18 at 00:00
1

Your perl -e'...$cols_new...' is using single shell quotes, so the shell is not interpolating the variable.

While you can use interpolation or a command line argument to get information from the shell to a perl oneliner, often an environment variable is less troublesome:

export cols_new=1,2
perl -F'\t' -lane 'print join ",", @F[split /,/, $ENV{cols_new}]' inputfile
ysth
  • 96,171
  • 6
  • 121
  • 214
  • Or in the some simple cases (like this one) just replace the two `'` with `"` and `","` with `'"'` or `q(,)`. As in `perl -F'\t' -lane "print join q(,), @F[split /,/, $cols_new]" inputfile`. But I think it's better to use perl to generate the array of random numbers like in @beasy 's answer. – Kjetil S. Dec 27 '18 at 22:36
  • @KjetilS. except that the question was "I want to pass a variable list of columns". the bit about random columns was only "For example" – ysth Dec 28 '18 at 01:00
1

You can just do the random number generation in Perl:

perl -F'\t' -lane 'BEGIN { @cols = map int(rand 50) + 1, 1 .. 5 } print join ",", @F[@cols]' inputfile
melpomene
  • 84,125
  • 8
  • 85
  • 148
0

Thank you all very much !! I solved the problem following your suggestions (see below):

  • Randomly selects $extractColumnCount columns from the range 2-$fileColumnCount, sort them and place them in $cols_new_temp

cols_new_temp=$(echo $(shuf -i 2-$fileColumnCount -n $extractColumnCount | sort -n))

echo $cols_new_temp

  • Here I add commas to separate the array of column labels and place it in $cols_new

cols_new=$(echo $cols_new_temp | sed 's/ /,/g')

echo $cols_new

  • This Perl oneliner retrieves a subset of pre-specified randomly-selected columns ($cols_new) from the file specified in $file1, adding the first column and the output column. The resulting file is then saved as $file2

output_col=1

time perl -F',' -lane "print join q(,), @F[split "," $output_col,$cols_new]" $file1 > $file2

Community
  • 1
  • 1