27

I have the following awk command within a "for" loop in bash:

awk -v pdb="$pdb" 'BEGIN {file = 1; filename = pdb"_" file ".pdb"}
 /ENDMDL/ {getline; file ++; filename = pdb"_" file ".pdb"}
 {print $0 > filename}' < ${pdb}.pdb 

This reads a series of files with the name $pdb.pdb and splits them in files called $pdb_1.pdb, $pdb_2.pdb, ..., $pdb_21.pdb, etc. However, I would like to produce files with names like $pdb_01.pdb, $pdb_02.pdb, ..., $pdb_21.pdb, i.e., to add padding zeros to the "file" variable.

I have tried without success using printf in different ways. Help would be much appreciated.

mirix
  • 511
  • 1
  • 5
  • 13

5 Answers5

43

Here's how to create leading zeros with awk:

# echo 1 | awk '{ printf("%02d\n", $1) }'
01
# echo 21 | awk '{ printf("%02d\n", $1) }'
21

Replace %02 with the total number of digits you need (including zeros).

JJ.
  • 5,425
  • 3
  • 26
  • 31
  • 1
    Note that this only works if you're directly printing the formatted numbers to the output. If you're looking to use the formatted number in an awk variable or function, you'll likely need to use `sprintf`, as mentioned in the other answer. – R.M. Nov 07 '16 at 16:37
  • `awk '{ printf "%0" $2 "d\n", $1 }'` works fine here. – tripleee Feb 04 '22 at 09:57
36

Replace file on output with sprintf("%02d", file).

Or even the whole assigment with filename = sprintf("%s_%02d.pdb", pdb, file);.

glglgl
  • 89,107
  • 13
  • 149
  • 217
3

This does it without resort of printf, which is expensive. The first parameter is the string to pad, the second is the total length after padding.

echo 722 8 | awk '{ for(c = 0; c < $2; c++) s = s"0"; s = s$1; print substr(s, 1 + length(s) - $2); }'

If you know in advance the length of the result string, you can use a simplified version (say 8 is your limit):

echo 722 | awk '{ s = "00000000"$1; print substr(s, 1 + length(s) - 8); }'

The result in both cases is 00000722.

ThomasMcLeod
  • 7,603
  • 4
  • 42
  • 80
  • 1
    Dunno by what criteria this would be faster. In a quick test, 10,000 iterations of this script took 42 seconds whilst the obviously much simpler variation with `printf` took 35. – tripleee Feb 04 '22 at 09:56
1

Here is a function that left or right-pads values with zeroes depending on the parameters: zeropad(value, count, direction)

function zeropad(s,c,d) {
    if(d!="r")             
        d="l"                # l is the default and fallback value
    return sprintf("%" (d=="l"? "0" c:"") "d" (d=="r"?"%0" c-length(s) "d":""), s,"")
}
{                            # test main
    print zeropad($1,$2,$3)
}

Some tests:

$ cat test
2 3 l
2 4 r
2 5
a 6 r

The test:

$ awk -f program.awk test
002
2000
00002
000000

It's not fully battlefield tested so strange parameters may yield strange results.

James Brown
  • 36,089
  • 7
  • 43
  • 59
0

here's a VERY unconventional way of leveraging OFS to pad zeros :

jot 10 1 - 12333337 | 

mawk '(___ = __ - length($_)) <= _ || $++___ = $_ ($_=_)' OFS=0 __=23

00000000000000000000001
00000000000000012333338
00000000000000024666675
00000000000000037000012
00000000000000049333349
00000000000000061666686
00000000000000074000023
00000000000000086333360
00000000000000098666697
00000000000000111000034

They don't have to be zeros either. The same approach works just as fine padding emojis :

jot 10 1 - 12333337 | 

mawk2 '  (___ = __-length($_)) <=_ || 
         $++___ = $_ ($_ = _)' OFS='\360\237\246\201' __=17 |

gawk -e '$++NF = length($1)'

1 17
12333338 17
24666675 17
37000012 17
49333349 17
61666686 17
74000023 17
86333360 17
98666697 17
111000034 17
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11