1

Here are some values where I need to change :

if first column is 2 => 1

if first column is 8 => 2

if first column is 16 => 3

CHR          SNP         BP   A1       TEST    NMISS       BETA         STAT            P 
   2   rs10173732      31404    A        ADD     2607   -0.02162       -1.552       0.1207
   2   rs10173732      31404    A       COV1     2607     0.2659        24.15   1.849e-116
   2   rs11684864    2547285    G        ADD     2596  -0.009581      -0.6387       0.5231
   2   rs11684864    2547285    G       COV1     2596     0.2672        24.18   1.212e-116
   2   rs11684864    2547285    G       COV2     2596   0.004941        9.564    2.548e-21
   8    rs3826201   88651817    T       COV3     2576    -0.0186        -15.7    4.335e-53
  16    rs8047319   88684276    C        ADD     2538    0.01115        1.271        0.204
  16    rs8047319   88684276    C       COV1     2538     0.2632        23.73   1.402e-112
  16    rs8047319   88684276    C       COV2     2538   0.005039        9.715    6.276e-22
  16    rs8047319   88684276    C       COV3     2538   -0.01891        -15.9    2.583e-54

However this command is not convenient, as it changes indentation and the line with the 8 doesn't seem to appreciate :

awk '{ if ( $1 == 2 ) { $1 = 1 } else if ( $1 == 8 ) { $1 == 2 } else if ( $1 == 16 ) { $1 = 3 }; print}' TEST > TESTnew

output :

 CHR          SNP         BP   A1       TEST    NMISS       BETA         STAT            P 
1 rs10173732 31404 A ADD 2607 -0.02162 -1.552 0.1207
1 rs10173732 31404 A COV1 2607 0.2659 24.15 1.849e-116
1 rs11684864 2547285 G ADD 2596 -0.009581 -0.6387 0.5231
1 rs11684864 2547285 G COV1 2596 0.2672 24.18 1.212e-116
1 rs11684864 2547285 G COV2 2596 0.004941 9.564 2.548e-21
       8    rs3826201   88651817    T       COV3     2576    -0.0186        -15.7    4.335e-53
3 rs8047319 88684276 C ADD 2538 0.01115 1.271 0.204
3 rs8047319 88684276 C COV1 2538 0.2632 23.73 1.402e-112
3 rs8047319 88684276 C COV2 2538 0.005039 9.715 6.276e-22
3 rs8047319 88684276 C COV3 2538 -0.01891 -15.9 2.583e-54

How would you right something more universal (meaning it works even if indentation for a particular line is different) and that doesn't change indentation from the original file?

Barmar
  • 741,623
  • 53
  • 500
  • 612
Natha
  • 364
  • 1
  • 3
  • 20
  • I edited out the first line. It is either nonsense or you are admitting to wasting our time. Either way, it's inappropriate. – Mad Physicist Dec 13 '16 at 19:52
  • Aside from that, good question, and are you open to using `sed`? – Mad Physicist Dec 13 '16 at 19:54
  • Ok thanks ! Sure I'm open to using `sed`. It's just that I'm not familiar when using it with different conditions, but any good idea is welcome. – Natha Dec 13 '16 at 19:56
  • I just found how to do this here: http://stackoverflow.com/a/26568996/2988730. Apparently it's just semicolons – Mad Physicist Dec 13 '16 at 19:57
  • Instead of printing the line with `print`, you could use `printf()` so you can specify the column widths. – Barmar Dec 13 '16 at 20:07

2 Answers2

2

With GNU awk for the 3rd arg to match():

$ awk 'BEGIN { m[2]=1; m[8]=2; m[16]=3 }
    $1 in m { match($0,/(\s*\S+)(.*)/,a); $0=sprintf("%*s",length(a[1]),m[$1]) a[2] }
  1' file
CHR          SNP         BP   A1       TEST    NMISS       BETA         STAT            P
   1   rs10173732      31404    A        ADD     2607   -0.02162       -1.552       0.1207
   1   rs10173732      31404    A       COV1     2607     0.2659        24.15   1.849e-116
   1   rs11684864    2547285    G        ADD     2596  -0.009581      -0.6387       0.5231
   1   rs11684864    2547285    G       COV1     2596     0.2672        24.18   1.212e-116
   1   rs11684864    2547285    G       COV2     2596   0.004941        9.564    2.548e-21
   2    rs3826201   88651817    T       COV3     2576    -0.0186        -15.7    4.335e-53
   3    rs8047319   88684276    C        ADD     2538    0.01115        1.271        0.204
   3    rs8047319   88684276    C       COV1     2538     0.2632        23.73   1.402e-112
   3    rs8047319   88684276    C       COV2     2538   0.005039        9.715    6.276e-22
   3    rs8047319   88684276    C       COV3     2538   -0.01891        -15.9    2.583e-54

of if the 1, 2, and 3 values are just an incremental index:

$ awk 'BEGIN{split("2 8 16",t); for (i in t) m[t[i]]=i} $1 in m{match($0,/(\s*\S+)(.*)/,a); $0=sprintf("%*s",length(a[1]),m[$1]) a[2]} 1' file
CHR          SNP         BP   A1       TEST    NMISS       BETA         STAT            P
   1   rs10173732      31404    A        ADD     2607   -0.02162       -1.552       0.1207
   1   rs10173732      31404    A       COV1     2607     0.2659        24.15   1.849e-116
   1   rs11684864    2547285    G        ADD     2596  -0.009581      -0.6387       0.5231
   1   rs11684864    2547285    G       COV1     2596     0.2672        24.18   1.212e-116
   1   rs11684864    2547285    G       COV2     2596   0.004941        9.564    2.548e-21
   2    rs3826201   88651817    T       COV3     2576    -0.0186        -15.7    4.335e-53
   3    rs8047319   88684276    C        ADD     2538    0.01115        1.271        0.204
   3    rs8047319   88684276    C       COV1     2538     0.2632        23.73   1.402e-112
   3    rs8047319   88684276    C       COV2     2538   0.005039        9.715    6.276e-22
   3    rs8047319   88684276    C       COV3     2538   -0.01891        -15.9    2.583e-54

or you not need to specify the existing $1 values at all:

$ awk 'NR>1{ if ($1!=p) {c++; p=$1} match($0,/(\s*\S+)(.*)/,a); $0=sprintf("%*s",length(a[1]),c) a[2]} 1' file
CHR          SNP         BP   A1       TEST    NMISS       BETA         STAT            P
   1   rs10173732      31404    A        ADD     2607   -0.02162       -1.552       0.1207
   1   rs10173732      31404    A       COV1     2607     0.2659        24.15   1.849e-116
   1   rs11684864    2547285    G        ADD     2596  -0.009581      -0.6387       0.5231
   1   rs11684864    2547285    G       COV1     2596     0.2672        24.18   1.212e-116
   1   rs11684864    2547285    G       COV2     2596   0.004941        9.564    2.548e-21
   2    rs3826201   88651817    T       COV3     2576    -0.0186        -15.7    4.335e-53
   3    rs8047319   88684276    C        ADD     2538    0.01115        1.271        0.204
   3    rs8047319   88684276    C       COV1     2538     0.2632        23.73   1.402e-112
   3    rs8047319   88684276    C       COV2     2538   0.005039        9.715    6.276e-22
   3    rs8047319   88684276    C       COV3     2538   -0.01891        -15.9    2.583e-54
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

You could do the replacement in sed:

sed 's/^\( *\)2/\11/ ; s/^\( *\)8/\12/ ; s/^\( *\)16/\1 3/'

This does all three replacements in one script. ^\( *\) captures all the spaces at the beginning of the line and before the number. \1 replaces them to preserve indentation. I replace 16 with 3 for the same reason.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264