Shuffle rows and columns of a large csv file with bash or awk

Question

I want to create a script that randomly shuffles the rows and columns of a large csv file. For example, for a initial file f.csv:

a, b, c ,d
e, f, g, h
i, j, k, l

First, we shuffle the rows to obtain f1.csv:

e, f, g, h
a, b, c ,d
i, j, k, l

Then, we shuffle the columns f2.csv:

g, e, h, f
c, a, d, b
k, i, l, j

In order to shuffle the rows, we can use from here:

awk 'BEGIN{srand() }
{ lines[++d]=$0 }
END{
    while (1){
    if (e==d) {break}
        RANDOM = int(1 + rand() * d)
        if ( RANDOM in lines  ){
            print lines[RANDOM]
            delete lines[RANDOM]
            ++e
        }
    }
}' f.csv > f1.csv

But, how to shuffle the columns?

just do the same thing but do it on `$0` when populating lines[]. Try to code it yourself, the logics all there for you in your script. — Ed Morton, Sep 15 '14 at 17:50
got this error---> awk: 13: unexpected character '.'------- what i have made wrong? — NIMISHAN, Feb 05 '16 at 19:15

score 1 · Answer 1 · answered Sep 15 '14 at 18:12

If you're open to other languages, here's a ruby solution:

$ ruby -rcsv -e 'CSV.read(ARGV.shift).shuffle.transpose.shuffle.transpose.each {|row| puts row.to_csv}' f.csv
 j, k, l,i
 f, g, h,e
 b, c ,d,a

Ruby has got tons of builtin functionality, including shuffle and transpose methods on Arrays, which fits this problem exactly.

jaypal singh · Accepted Answer · 2014-09-16T22:26:33.277

Here is a way to shuffle columns using awk:

awk '
BEGIN { FS = " *, *"; srand() }
{ 
  for (col=1; col<=NF; col++) { 
    lines[NR,col] = $col; 
    columns[col] 
  }
}
END {
  while (1) {
    if (fld == NF) { break }
    RANDOM = int (1 + rand() * col)
    if (RANDOM in columns) {
      order[++seq] = RANDOM
      delete columns[RANDOM]
      ++fld
    }
  }
  for (nr=1; nr<=NR; nr++) {
    for (fld=1; fld<=seq; fld++) {
      printf "%s%s", lines[nr,order[fld]], (fld==seq?RS:", ")
    }
  }
}' f.csv

Output:

b, a, c, d
f, e, g, h
j, i, k, l

Shuffle rows and columns of a large csv file with bash or awk

2 Answers2

Output: