0

I have a huge data set, and I want to remove the columns which have the same value of -9. I have 20 rows and million columns.

I want to remove column 1,9,11 (which are all same with the value of -9) this is just a small portion of my data set. I cannot count which columns have all -9 with my full data set.

Appreciate you help,

Thanks!

sample1 -9 3 -9 0 -9 2 -9 -9 -9 1 -9

sample2 -9 -9 0 -9 0 2 -9 -9 -9 -9 -9

sample3 -9 -9 -9 -9 -9 2 3 1 -9 -9 -9

sample4 -9 -9 -9 -9 -9 2 -9 1 -9 -9 -9

Output;

sample1 3 -9 0 -9 2 -9 -9 1

sample2 -9 0 -9 0 2 -9 -9 -9

sample3 -9 -9 -9 -9 2 3 1 -9

sample4 -9 -9 -9 -9 2 -9 1 -9

I tried to remove it with perl.

mgg
  • 1
  • 1
  • possible duplicate of [delete a column with awk or sed](http://stackoverflow.com/questions/15361632/delete-a-column-with-awk-or-sed) – fedorqui Feb 10 '15 at 11:07
  • I don't want to delete specific columns, I want to remove the columns which are all -9, and I have a lot of columns. – mgg Feb 10 '15 at 11:23
  • Then clarify your question by: 1) using the tools to print in a proper format 2) showing your desired output 3) indicating what you tried so far. See [How do I ask a good question](http://stackoverflow.com/help/how-to-ask) for further references. – fedorqui Feb 10 '15 at 11:26
  • see http://stackoverflow.com/questions/28446556/how-to-print-columns-containing-value and adpat to your specific need – NeronLeVelu Feb 12 '15 at 07:38

1 Answers1

0

You can try below awk script.

cat a.txt | awk '{
                  for (Col = 1; Col <= NF; Col++) { a[NR, Col] = $Col } 
                 }         
                 END {    
                   for (Col = 1; Col <= NF ; Col++) { 
                      j=0;  
                      for( Row = 1 ; Row<= NR; Row++) 
                      { 
                         if (a[Row,Col] != -9 ) 
                          { j=1 } 
                      } 
                      if( j == 0 ) { continue;} 
                      for(Row=1; Row<= NR; Row++){ 
                         printf a[Row,Col] (Row==NR ? RS : FS)  
                      } 
                    } 
                }' | awk '{
                  for (Col = 1; Col <= NF; Col++) { a[NR, Col] = $Col } 
                 }         
                 END {    
                   for (Col = 1; Col <= NF ; Col++) { 
                      for(Row=1; Row<= NR; Row++){ 
                         printf a[Row,Col] (Row==NR ? RS : FS)  
                      } 
                    } 
                }'


Test:

  • INPUT:
    -9 1 2 2 -9 0
    -9 -9 2 5 -9 3
    -9 2 3 3 -9 -9
    -9 7 4 6 -9 4
  • OUTPUT:
    1 2 2 0
    -9 2 5 3
    2 3 3 -9
    7 4 6 4


Steps:

  1. Transpose matrix (then each column will become row)
  2. Then exclude row which has got all -9 values
  3. Again transpose matrix

Reference : link
Community
  • 1
  • 1
Shubhangi
  • 2,229
  • 2
  • 14
  • 14
  • Great thanks. It works for the small data, but for my full data set it's not working!! – mgg Feb 11 '15 at 11:22
  • My file is too big, so I cannot use cat. What can I use instead of cat? – mgg Feb 11 '15 at 11:34
  • I agree, above approach is not efficient for very large file. I think we need to scan file twice in script/program for handling big data. – Shubhangi Feb 12 '15 at 14:57