2

I have a simple text file containing some random number on every line. I wish to add preceding zeros for numbers which are less in digits. Is there a way to do this on command line (UNIX)

Input File:

235
25
1
963258
45
1356924

Output file:

0000235
0000025
0000001
0963258
0000045
1356924
mklement0
  • 382,024
  • 64
  • 607
  • 775
data-bite
  • 417
  • 2
  • 5
  • 17

3 Answers3

5

Using awk - printf:

$ cat testfile
235
25
1
963258
45
1356924
$ awk '{printf("%07d\n", $1)}' testfile  # %07d to pad 0s.
0000235
0000025
0000001
0963258
0000045
1356924
falsetru
  • 357,413
  • 63
  • 732
  • 636
2

The largest number of digits in the shown example is 7. With input file nums.txt

perl -ne 'printf("%07d\n", $_)' nums.txt > padded_nums.txt

> cat padded_nums.txt

0000235
0000025
0000001
0963258
0000045
1356924

If the number of digits of the largest number is in fact not known, a bit more is needed

perl -ne 'push @l, $_; }{ $lm = length( (sort {$b <=> $a} @l)[0] ); printf "%0${lm}d\n", $_ for @l' nums.txt

Prints the same.

As noted by glenn jackman in comments, using the List::Util module gives us a maximum in O(n), as opposed to sort with O(nlog(n)). This does load a module, albeit it being very light.

perl -MList::Util=max -ne ' ... $lm = length(max @l); ...'

Per request in a comment, here is how to pad numbers in shell, copied from this post

for i in $(seq -f "%05g" 10 15)
do
  echo $i
done

Also, in the same answer, a single printf formatting is offered

$ i=99
$ printf "%05d\n" $i
00099

Note the difference above, of 05g vs 05d between seq and printf.

Here is a way to process the whole file with bash, used from this post.

while read p; do
  printf "%07d\n" $p
done < nums.txt

Finally, use size=${#var} to find the length of a string in bash, if the length of the largest number is not known. This requires two passes over data, to find the max length and then to print with it.

These examples are for bash but tcsh has all these facilities as well, used with different syntax.

Community
  • 1
  • 1
zdim
  • 64,580
  • 5
  • 52
  • 81
  • `use List::Util qw(max); $lm = length(max @l)` should be more efficient than `sort`. – glenn jackman Mar 17 '16 at 13:43
  • @glennjackman Yes, for me it is, and I considered it. But I thought better keep it as standard as possible, and parenthesis may be clearer outside of Perl? The second example can in fact use the string comparison as well, so it's just `(sort @l)[-1]`, but I didn't want to go there. – zdim Mar 17 '16 at 18:10
  • max does *not* need to sort, that's the gain: max -> O(n) ; sort -> O(n log n) – glenn jackman Mar 17 '16 at 18:17
  • @glennjackman Right, but of course, it doesn't need to produce a sorted list. Thank you, adding to the answer. That does still leave the loading of another module. It'd be interesting for me to see how that compares in the end, with very small data sets. – zdim Mar 17 '16 at 18:23
  • I did a quick benchmark with a file containing one million random numbers: reading the data, using List::Util and finding the max took about 0.4 sec; reading the data and sorting took about 1.6 sec. For one hundred random numbers: max->0.013s, sort->0.006s – glenn jackman Mar 17 '16 at 19:01
  • @Yup, interesting! One has to pay even with such a tight module. Given the complexity difference I guess they'd be around within 100k range or so. – zdim Mar 17 '16 at 19:09
0

Using bash:

# read the numbers from the file into an array
mapfile -t nums < file

# find the maximum length
maxlen=0
for n in "${nums[@]}"; do len=${#n); (( len > maxlen )) && maxlen=$len; done

# print
for n in "${nums[@]}"; do printf "%0*d\n" $maxlen $n; done
0000235
0000025
0000001
0963258
0000045
1356924
glenn jackman
  • 238,783
  • 38
  • 220
  • 352