-1

I need to pad strings with zeros until they reach a limit of four digits, for example:

1 -> 0001
44 -> 0044
555 -> 0555
1a -> 0001a
44b -> 0044b
565c -> 0565c
7890 -> 7890

I have a bash script and I add the file containing those numbers as a parameter.

#!/bin/bash

FILE=$1
if [ ! -f $FILE ]; then
    exit 1
fi

sed -i 's/\<[0-9]\>/0&/g' $FILE
sed -i 's/\<[0-9][0-9]\>/0&/g' $FILE
sed -i 's/\<[0-9][0-9][0-9]\>/0&/g' $FILE

The script is not working on the 1a, 44b, 565c. I don't know how to ignore the letters.

rene
  • 41,474
  • 78
  • 114
  • 152
Mr. Kevin
  • 1
  • 1
  • 2
    `\>` is matching the word boundary: `1a` is considered a single word – Andrea Corbellini Sep 27 '17 at 22:06
  • Possible duplicate of [Padding zeros in a string](https://stackoverflow.com/questions/1117134/padding-zeros-in-a-string) – funky-future Sep 27 '17 at 22:23
  • 3
    I disagree with @funky-future 's dupe flag, that question was about using printf, this is about sed, in both cases the question asker already knows how to do the padding and is having trouble with the implementation – Will Barnwell Sep 27 '17 at 22:38
  • Can you give a typical input line? Can there be several such numbers per line? – xhienne Sep 28 '17 at 00:22
  • 2
    Welcome to Stack Overflow! Please do not vandalize your posts. By posting on the Stack Exchange network, you've granted a non-revocable right for SE to distribute that content (under the [CC BY-SA 3.0 license](https://creativecommons.org/licenses/by-sa/3.0/)). By SE policy, any vandalism will be reverted. If you would like to disassociate this post from your account, see [What is the proper route for a disassociation request?](https://meta.stackoverflow.com/q/323395) – Suraj Rao Sep 28 '17 at 06:50

4 Answers4

2

GNU awk would be a better tool here:

awk -i inplace 'match($1,/([0-9]*)(.*)/,arr){$1=sprintf("%04d%s",arr[1],arr[2])}1' input.txt

which pads $1 to 4 digits.

Testing:

$ cat input.txt
1
44
555
1a
44b
565c
7890

awk 'match($1,/([0-9]*)(.*)/,arr){$1=sprintf("%04d%s",arr[1],arr[2])}1' input.txt
0001
0044
0555
0001a
0044b
0565c
7890

Suppose data is ordered like in @xhienne's answer, then we loop over the fields:

$ cat input.txt
1 44 555 1a 44b 565c 7890 77777

$ cat tst.awk
{ for (i=1;i<=NF;i++)
    if (match($i,/([0-9]*)(.*)/,arr))
      $i=sprintf("%04d%s",arr[1],arr[2])
}1

$ awk -f tst.awk input.txt
0001 0044 0555 0001a 0044b 0565c 7890 77777
Marc Lambrichs
  • 2,864
  • 2
  • 13
  • 14
1

Prefix each sequence of digits with 000 and then truncate the result to the last four digits:

sed -i '
    s/[0-9]\{1,\}/000&/g
    s/0*\([0-9]\{4\}\)/\1/g
' "$FILE"

Or with GNU sed:

sed -i -r '
    s/[0-9]+/000&/g
    s/0*([0-9]{4})/\1/g
' "$FILE"

Example:

Sample line : 1 44 555 1a 44b 565c 7890 77777

Yields:

Sample line : 0001 0044 0555 0001a 0044b 0565c 7890 77777
xhienne
  • 5,738
  • 1
  • 15
  • 34
  • I'd recommend against including the `-i` option here, despite the fact that it's in the question, as its use differs between sed implementations. A POSIX solution that modifies the original file would not be possible with sed alone. Also, for the sake of simplicity, it might be useful to mention that this can be done without newlines. `sed -e 's/[0-9]/000&/' -e 's/[0-9]*\([0-9]\{4\}\)/\1/g'` for example. But despite that ... nicely done. :) – ghoti Sep 28 '17 at 00:25
  • This will cut down numbers of more than 4 digits. "7777777" will be transformed to "7777". – Marc Lambrichs Sep 28 '17 at 00:28
  • @MarcLambrichs, yes -- none of the OP's sample data is more than 4 digits. Behaviour in cases with >4 digits are unspecified in the question. – ghoti Sep 28 '17 at 00:30
  • @ghoti Thanks. The newlines were intentionally added for the sake of readability. – xhienne Sep 28 '17 at 00:36
  • @ghoti Yup. Just like the number of columns in the input is unspecified. – Marc Lambrichs Sep 28 '17 at 00:40
  • @MarcLambrichs There are probably no number exceeding 4 digits, but answer corrected anyway. As for the number of columns, the 'g' substitution flag indicates there may be more than one, and there is no anchor that may indicate that the number is at the first column – xhienne Sep 28 '17 at 00:42
0

To match zero or more characters we can use * and to match any non-digit we can use [^0-9]

So adapting your regex to include [^0-9]* after the digit matches and before the pattern matching the rest of the string should allow matching those letters.

Will Barnwell
  • 4,049
  • 21
  • 34
0

Could you please try one more approach with awk and let me know if tis helps you.

awk '{val=$0;gsub(/[0-9]+/,"",val);printf("%04d%s\n",$0,val)}'  Input_file

Output will be as follows.

0001
0044
0555
0001a
0044b
0565c
7890

Explanation: Adding non-one liner form of solution with explanation too here.

awk '{
val=$0;                   ##Storing current line into a variable named val here.
gsub(/[0-9]+/,"",val);    ##Globally substituting all digits with NULL in variable val now, to make sure we are getting everything apart from digits.
printf("%04d%s\n",$0,val);##Now using printf of awk, whose quality is it will automatically take till all digits and do padding with zeros if needed till to make it 4 digit number that is why %04d is being used then I am using %s to print string with respect to the value of val, where we stored all values of strings previously.
}
' Input_file             ##Mentioning Input_file name here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93