Ignore possible non-digits on the end of numbers in sed substitutions

Question

I need to pad strings with zeros until they reach a limit of four digits, for example:

1 -> 0001
44 -> 0044
555 -> 0555
1a -> 0001a
44b -> 0044b
565c -> 0565c
7890 -> 7890

I have a bash script and I add the file containing those numbers as a parameter.

#!/bin/bash

FILE=$1
if [ ! -f $FILE ]; then
    exit 1
fi

sed -i 's/\<[0-9]\>/0&/g' $FILE
sed -i 's/\<[0-9][0-9]\>/0&/g' $FILE
sed -i 's/\<[0-9][0-9][0-9]\>/0&/g' $FILE

The script is not working on the 1a, 44b, 565c. I don't know how to ignore the letters.

`\>` is matching the word boundary: `1a` is considered a single word — Andrea Corbellini, Sep 27 '17 at 22:06
Possible duplicate of [Padding zeros in a string](https://stackoverflow.com/questions/1117134/padding-zeros-in-a-string) — funky-future, Sep 27 '17 at 22:23
I disagree with @funky-future 's dupe flag, that question was about using printf, this is about sed, in both cases the question asker already knows how to do the padding and is having trouble with the implementation — Will Barnwell, Sep 27 '17 at 22:38
Can you give a typical input line? Can there be several such numbers per line? — xhienne, Sep 28 '17 at 00:22
Welcome to Stack Overflow! Please do not vandalize your posts. By posting on the Stack Exchange network, you've granted a non-revocable right for SE to distribute that content (under the [CC BY-SA 3.0 license](https://creativecommons.org/licenses/by-sa/3.0/)). By SE policy, any vandalism will be reverted. If you would like to disassociate this post from your account, see [What is the proper route for a disassociation request?](https://meta.stackoverflow.com/q/323395) — Suraj Rao, Sep 28 '17 at 06:50

Marc Lambrichs · Answer 1 · 2017-09-28T01:15:30.940

2

GNU awk would be a better tool here:

awk -i inplace 'match($1,/([0-9]*)(.*)/,arr){$1=sprintf("%04d%s",arr[1],arr[2])}1' input.txt

which pads $1 to 4 digits.

Testing:

$ cat input.txt
1
44
555
1a
44b
565c
7890

awk 'match($1,/([0-9]*)(.*)/,arr){$1=sprintf("%04d%s",arr[1],arr[2])}1' input.txt
0001
0044
0555
0001a
0044b
0565c
7890

Suppose data is ordered like in @xhienne's answer, then we loop over the fields:

$ cat input.txt
1 44 555 1a 44b 565c 7890 77777

$ cat tst.awk
{ for (i=1;i<=NF;i++)
    if (match($i,/([0-9]*)(.*)/,arr))
      $i=sprintf("%04d%s",arr[1],arr[2])
}1

$ awk -f tst.awk input.txt
0001 0044 0555 0001a 0044b 0565c 7890 77777

edited Sep 28 '17 at 01:15

answered Sep 27 '17 at 22:41

Marc Lambrichs

2,864
2
13
14

1

Not sure there is only one number at the beginning of the line since the poster used the `g` flag for their substitutions. – xhienne Sep 28 '17 at 00:20
What makes this better than xhienne's sed solution? – ghoti Sep 28 '17 at 00:27
xhienne's solution is wrong. It will turn the number 777777 into 7777. – Marc Lambrichs Sep 28 '17 at 00:29
Let me rephrase. Xhienne's solution handles numbers with more than 4 digits wrong. There's some things that need to be specified by OP, and therefore there's no right or wrong. – Marc Lambrichs Sep 28 '17 at 00:42

xhienne · Answer 2 · 2017-09-28T00:49:07.080

1

Prefix each sequence of digits with 000 and then truncate the result to the last four digits:

sed -i '
    s/[0-9]\{1,\}/000&/g
    s/0*\([0-9]\{4\}\)/\1/g
' "$FILE"

Or with GNU sed:

sed -i -r '
    s/[0-9]+/000&/g
    s/0*([0-9]{4})/\1/g
' "$FILE"

Example:

Sample line : 1 44 555 1a 44b 565c 7890 77777

Yields:

Sample line : 0001 0044 0555 0001a 0044b 0565c 7890 77777

edited Sep 28 '17 at 00:49

answered Sep 28 '17 at 00:12

xhienne

5,738
1
15
34

I'd recommend against including the `-i` option here, despite the fact that it's in the question, as its use differs between sed implementations. A POSIX solution that modifies the original file would not be possible with sed alone. Also, for the sake of simplicity, it might be useful to mention that this can be done without newlines. `sed -e 's/[0-9]/000&/' -e 's/[0-9]*$[0-9]\{4\}$/\1/g'` for example. But despite that ... nicely done. :) – ghoti Sep 28 '17 at 00:25
This will cut down numbers of more than 4 digits. "7777777" will be transformed to "7777". – Marc Lambrichs Sep 28 '17 at 00:28
@MarcLambrichs, yes -- none of the OP's sample data is more than 4 digits. Behaviour in cases with >4 digits are unspecified in the question. – ghoti Sep 28 '17 at 00:30
@ghoti Thanks. The newlines were intentionally added for the sake of readability. – xhienne Sep 28 '17 at 00:36
@ghoti Yup. Just like the number of columns in the input is unspecified. – Marc Lambrichs Sep 28 '17 at 00:40
@MarcLambrichs There are probably no number exceeding 4 digits, but answer corrected anyway. As for the number of columns, the 'g' substitution flag indicates there may be more than one, and there is no anchor that may indicate that the number is at the first column – xhienne Sep 28 '17 at 00:42

score 0 · Answer 3 · answered Sep 27 '17 at 22:47

0

To match zero or more characters we can use * and to match any non-digit we can use [^0-9]

So adapting your regex to include [^0-9]* after the digit matches and before the pattern matching the rest of the string should allow matching those letters.

answered Sep 27 '17 at 22:47

Will Barnwell

4,049
21
34

RavinderSingh13 · Answer 4 · 2017-09-28T01:05:33.230

Could you please try one more approach with awk and let me know if tis helps you.

awk '{val=$0;gsub(/[0-9]+/,"",val);printf("%04d%s\n",$0,val)}'  Input_file

Output will be as follows.

Explanation: Adding non-one liner form of solution with explanation too here.

awk '{
val=$0;                   ##Storing current line into a variable named val here.
gsub(/[0-9]+/,"",val);    ##Globally substituting all digits with NULL in variable val now, to make sure we are getting everything apart from digits.
printf("%04d%s\n",$0,val);##Now using printf of awk, whose quality is it will automatically take till all digits and do padding with zeros if needed till to make it 4 digit number that is why %04d is being used then I am using %s to print string with respect to the value of val, where we stored all values of strings previously.
}
' Input_file             ##Mentioning Input_file name here.

Ignore possible non-digits on the end of numbers in sed substitutions

4 Answers4