-1

I have a file containing many lines that are separated by one whitespace for example:

ATOM 9803 C2' 5AD 303 72.790 69.600 43.700 +0.140 0.00 PROT C 0
ATOM 9804 H2'' 5AD 303 73.450 68.960 44.300 +0.090 0.00 PROT H 0
ATOM 15 HE1 MET 1 110.230 70.830 15.490 +0.090 0.00 PROT H 0 
ATOM 16 HE2 MET 1 111.230 72.300 15.480 +0.090 0.00 PROT H 0
ATOM 17 HE3 MET 1 112.070 70.760 15.610 +0.090 0.00 PROT H 0
ATOM 24 CB GLU 2 107.460 66.320 18.200 -0.180 0.00 PROT C 0 
ATOM 251 HB1 GLU 2 106.550 65.940 17.700 +0.090 0.00 PROT H 0

I would like to arrange the columns according to a reference column with a specific spacing

AAAA  9999  AAAA  AAA 999      99.999   99.999  99.999 +9.999 9.99      AAAA A 9

so the final output should be

ATOM  9803  C2'   5AD 303      72.790   69.600  43.700 +0.140 0.00      PROT C 0 
ATOM  9804  H2''  5AD 303      73.450   68.960  44.300 +0.090 0.00      PROT H 0 
ATOM  15    HE1   MET 1        110.230  70.830  15.490 +0.090 0.00      PROT H 0    

So far I'm trying to do it with awk but it does not work. It replaces all the lines with the reference line.

reference_line="AAAA  9999  AAAA  AAA 999      99.999   99.999  99.999 +9.999 9.99      AAAA A 9"

awk -v ref_line="$reference_line" '
    BEGIN { OFS = "" }
    NR == 1 { print; next }
    {
        for (i = 1; i <= NF; i++) {
            if (i == 2) {
                $i = substr(ref_line, 11, 5)
            } else if (i == 3) {
                $i = substr(ref_line, 17, 4)
            } else if (i == 4) {
                $i = substr(ref_line, 22, 4)
            } else if (i == 6) {
                $i = substr(ref_line, 32, 5)
            } else if (i == 7) {
                $i = substr(ref_line, 39, 7)
            } else if (i == 8) {
                $i = substr(ref_line, 47, 8)
            } else if (i == 9) {
                $i = substr(ref_line, 56, 8)
            } else if (i == 10) {
                $i = substr(ref_line, 65, 8)
            } else if (i == 11) {
                $i = substr(ref_line, 74, 8)
            } else if (i == 12) {
                $i = substr(ref_line, 83)
            }
        }
        print
    }' out.txt > rearranged_out.txt
Gilli
  • 23
  • 4
  • Are the column widths known ahead of time or determined by that reference line? If the former like your attempt seems to imply, why even have a reference line? – Shawn Aug 02 '23 at 18:58
  • And where does the sorting mentioned in the title come in? And why don't the number of lines in the sample input and output match? – Shawn Aug 02 '23 at 19:04

2 Answers2

3

Here's an answer that extracts the column widths from the reference line

ref='AAAA  9999  AAAA  AAA 999      99.999   99.999  99.999 +9.999 9.99      AAAA A 9'

awk -v ref="$ref" '
    BEGIN {
        while (match(ref, /^[^[:blank:]]+[[:blank:]]*/)) {
            wid[++i] = RLENGTH
            ref = substr(ref, RLENGTH+1)
        }
    }
    {
        for (i=1; i<=NF; i++) printf "%-*s", wid[i], $i
        printf "\n"
    }
' file

A side effect of the match(str, pattern) function is setting the RSTART and RLENGTH variables.

glenn jackman
  • 238,783
  • 38
  • 220
  • 352
0
text = """ATOM 9803 C2' 5AD 303 72.790 69.600 43.700 +0.140 0.00 PROT C 0
ATOM 9804 H2'' 5AD 303 73.450 68.960 44.300 +0.090 0.00 PROT H 0
ATOM 15 HE1 MET 1 110.230 70.830 15.490 +0.090 0.00 PROT H 0 
ATOM 16 HE2 MET 1 111.230 72.300 15.480 +0.090 0.00 PROT H 0
ATOM 17 HE3 MET 1 112.070 70.760 15.610 +0.090 0.00 PROT H 0
ATOM 24 CB GLU 2 107.460 66.320 18.200 -0.180 0.00 PROT C 0 
ATOM 251 HB1 GLU 2 106.550 65.940 17.700 +0.090 0.00 PROT H 0"""

for line in text.splitlines():
    words = line.split(" ")
    print("{:<6}{:<6}{:<6}{:<4}{:<9}{:<9}{:<8}{:<7}{:<7}{:<10}{:<5}{:<2}{}".format(*words))

You can check out this question for an explanation :)

nate-thegrate
  • 378
  • 2
  • 13
  • suggesting the reader jump to a linked Q&A (w/ 8 answers) isn't an explanation (eg, which question and/or answer and/or comment at that link serves as an 'explanation'?); this looks like a (relatively) simple answer so consider updating this answer to replace the link with an actual/textual explanation – markp-fuso Aug 02 '23 at 19:16
  • I was on the fence about whether to type out an answer or to mark the question as a duplicate; you have a good point though. – nate-thegrate Aug 02 '23 at 19:24