1

I have a tab delimited file which have 2 columns as:

new.txt
    1.01   yes
    2.00   no
    0.93   no
    1.2223 yes
    1.7211 no

I want to modify the contents of it as if there are two categories as:

new_categorized.txt
yes    no
1.01   2.00
1.2223 0.93
       1.7211

I have found a similar question with an answer in R (here) ,however I need to do it with bash or awk.. I would appreciate your help.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
bapors
  • 887
  • 9
  • 26

3 Answers3

3
$ cat tst.awk
BEGIN { FS=OFS="\t" }
!($2 in label2colNr) {
    label2colNr[$2] = ++numCols
    colNr2label[numCols] = $2
}
{
    colNr = label2colNr[$2]
    val[++numRows[colNr],colNr] = $1
    maxRows = (numRows[colNr] > maxRows ? numRows[colNr] : maxRows)
}
END {
    for (colNr=1; colNr <= numCols; colNr++) {
        printf "%s%s", colNr2label[colNr], (colNr<numCols ? OFS : ORS)
    }

    for (rowNr=1; rowNr <= maxRows; rowNr++) {
        for (colNr=1; colNr <= numCols; colNr++) {
            printf "%s%s", val[rowNr,colNr], (colNr<numCols ? OFS : ORS)
        }
    }
}

$ awk -f tst.awk file
yes     no
1.01    2.00
1.2223  0.93
        1.7211

The above will work with any awk in any shell on any UNIX system no matter how many categories you have in the 2nd field and no matter what their values are.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
2

With bash, GNU grep and paste:

echo -e "yes\tno"
paste <(grep -Po '^\t\K.*(?=\tyes)' new.txt) <(grep -Po '^\t\K.*(?=\tno)' new.txt)

Output:

yes     no
1.01    2.00
1.2223  0.93
        1.7211
Cyrus
  • 84,225
  • 14
  • 89
  • 153
2

GNU awk solution:

awk '{ a[$2][($2=="yes"? ++y : ++n)]=$1 }
     END{ 
         max=(y > n? y:n); 
         print "yes","no";
         for(i=1; i<=max; i++) print a["yes"][i], a["no"][i] 
     }' OFS='\t' file | column -tn

The output:

yes     no
1.01    2.00
1.2223  0.93
        1.7211
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105