Stripping the leading zeros but leave a single 0

Question

So let me start of by saying that I'm new to bash so I would appreciate a simple explanation on the answers you give.

I've got the following block of code:

name="Chapter 0000 (sub s2).cbz "

s=$(echo $name | grep -Eo '[0-9]+([.][0-9]+)?' | tr '\n' ' ' | sed 's/^0*//')

echo $s

readarray -d " " -t myarr <<< "$s"

if [[ $(echo "${myarr[0]} < 100 && ${myarr[0]} >= 10" | bc) -ne 0 ]]; then
    myarr[0]="0${myarr[0]}"
elif [[ $(echo "${myarr[0]} < 10" | bc) -ne 0 ]]; then
    myarr[0]="00${myarr[0]}"
fi

newName="Chapter ${myarr[0]}.cbz"

echo $newName

which (in this case) would end up spitting out:

 2
(standard_in) 1: syntax error
(standard_in) 1: syntax error
Chapter .cbz

(I'm fairly certain that the syntax errors are because ${myarr[0]} is null when doing the comparisons)

This is not the output I want. I want the code to strip leading 0's but leave a single 0 if its all 0.

So the code to really change would be sed 's/^0*//') but I'm not sure how to change it.

(expected outputs:

              in   --->   out
1) chapter 8.cbz   ---> Chapter 008.cbz
2) chapter 1.3.cbz   ---> Chapter 001.3.cbz
3) _23 (sec 2).cbz   ---> Chapter 023.cbz
4) chapter 00009.cbz   ---> Chapter 009.cbz
5) chap 0000112.5.cbz   ---> Chapter 112.5.cbz
6) Chapter 0000 (sub s2).cbz   ---> Chapter 000.cbz

so far the code I got works for 1- 3 but not the leading 0 cases (4-6))

I'm aware of [this](https://stackoverflow.com/questions/5678455/remove-leading-zeroes-but-not-all-zeroes) but that code seems to not work for some reason — Exodos, Feb 13 '22 at 23:11
you say you want to **strip** leading zeros but your code appears to add them — jhnc, Feb 14 '22 at 00:48
please update the table (at the end of the question) to include the string you've got at the top of the code block: `"Chapter 0000.cbz (sub s2)"` ... what's the expected output for this one? — markp-fuso, Feb 14 '22 at 01:31

Fravadona · Accepted Answer · 2022-02-14T20:56:07.697

2

In pure bash:

#!/bin/bash

for name in \
    'chapter 8.cbz' \
    'chapter 1.3.cbz' \
    '_23 (sec 2).cbz' \
    'chapter 00009.cbz' \
    'chap 0000112.5.cbz' \
    'Chapter 0000 (sub s2).cbz' \
    '_23.2 (sec 2).cbz'
do

##### The relevant part #####

[[ $name =~ ^[^0-9]*([0-9]+)([0-9.]*)[^.]*(\..*)$ ]]

chapter=$(( 10#${BASH_REMATCH[1]} ))
suffix="${BASH_REMATCH[2]}${BASH_REMATCH[3]}"

newName=$(printf 'Chapter %03d%s' "$chapter" "$suffix")

#############################

echo "$newName"

done

Chapter 008.cbz
Chapter 001.3.cbz
Chapter 023.cbz
Chapter 009.cbz
Chapter 112.5.cbz
Chapter 000.cbz
Chapter 023.2.cbz

notes:

[[ =~ ]] is the way to use an ERE regex in bash. The one that I wrote has two capture groups: one for capturing the first appearing sequence of digits (which should be the chapter number), and one for capturing all the characters that appear after the first dot (included).
$(( 10#... )) converts a zero prefixed decimal to a normal decimal. This is needed because a number that starts with 0 would mean that's an octal instead of a decimal.
printf '%03d' converts a number to a decimal of at least 3 digits, padding the left with zeros when it's not the case.

edited Feb 14 '22 at 20:56

answered Feb 14 '22 at 01:30

Fravadona

13,917
1
23
35

Hey thanks for the explanation, you're answer is good but fails for input 6 (all 0 case). Can you update your answer to make it work? – Exodos Feb 14 '22 at 15:28
@Exodos What you mean? I just added your 6th test-case and it gives the expect result – Fravadona Feb 14 '22 at 15:51
Thanks for the quick fix. It fails for an input that I didn't specify (3 but the number has a decimal, think "_23.2 (sec 2).cbz" ) but other than that this is a nice answer – Exodos Feb 14 '22 at 15:54
My bad, the first comment was an input mistake on my end, but the second one still stands – Exodos Feb 14 '22 at 15:58
What would be the expected output for that example? `Chaper 023.2.cbz`? – Fravadona Feb 14 '22 at 16:17
Yea. The I know for sure that the first number is the number I need pretty much all the time, the second number is just unwanted baggage. – Exodos Feb 14 '22 at 16:32
I'm not sure to understand but I fixed the regex for also capturing the dots + numbers that may appear after the chapter number – Fravadona Feb 14 '22 at 16:40
Nice. It works flawlessly now. Thanks for the answer. – Exodos Feb 14 '22 at 16:56

HatLess · Answer 2 · 2022-02-14T04:51:54.823

2

Using sed

$ sed 's/[^0-9]*0\+\?\([0-9]\{1,\}\)[^.]*\(\..*\)/Chapter 00\1\2/;s/0\+\([0-9]\{3,\}\)/\1/' file
Chapter 008.cbz
Chapter 001.3.cbz
Chapter 023.cbz
Chapter 009.cbz
Chapter 112.5.cbz

s/[^0-9]*0\+\?$[0-9]\{1,\}$[^.]*$\..*$/Chapter 00\1\2/ - Strip everything up to a digit that is not zero, then add Chapter at the beginning as well as 2 zero after stripping the initial zeros.

s/0\+$[0-9]\{3,\}$/\1/ - Once again, strip excess zeros ensuring only three digits before the period remain.

edited Feb 14 '22 at 04:51

answered Feb 14 '22 at 04:37

HatLess

10,622
5
14
32

not working as desired – Dudi Boy Feb 14 '22 at 04:42
Like Dubi said the code seems to work on all cases except in input 3. Pretty much a one line solution otherwise – Exodos Feb 14 '22 at 15:35
It works in all my labs. Does `sed -E 's/[^0-9]*0+?([0-9]{1,})[^.]*(\..*)/Chapter 00\1\2/;s/0+([0-9]{3,})/\1/' file` work? – HatLess Feb 14 '22 at 16:26

Dudi Boy · Answer 3 · 2022-02-14T04:40:06.760

1

Here is an awk script that does the trick:

script.awk

{
  str = "000" gensub("(^[[:digit:]]+\\.?[[:digit:]]*)( \\([^)]+\\))?(\\.cbz)", "\\1", "g", RT);
  str = gensub("(^[[:digit:]]+)([[:digit:]]{3})(.*$)", "\\2\\3", "g", str);
  printf("Chapter %s.cbz\n", str);
}

Test input.1.txt

1) chapter 8.cbz   
2) chapter 1.3.cbz 
3) _23 (sec 2).cbz 
4) chapter 00009.cbz
5) chap 0000112.5.cbz

Output:

awk -f script.awk RS='[[:digit:]]+[\\.]?[[:digit:]]*( \\([^)]+\\))?\\.cbz' input.1.txt
Chapter 008.cbz
Chapter 001.3.cbz
Chapter 023.cbz
Chapter 009.cbz
Chapter 112.5.cbz

edited Feb 14 '22 at 04:40

answered Feb 14 '22 at 00:49

Dudi Boy

4,551
1
15
30

`[0]+` is identical to `0+` – Bohemian Feb 14 '22 at 02:18

jhnc · Answer 4 · 2022-02-14T12:47:45.430

1

I think you could implement the table of results by sed alone:

sed '
    s/^[^0-9]*/000/
    s/[^0-9.].*$//
    s/\.*$/.cbz/
    s/^0*\([0-9]\{3\}\)/Chapter \1/
' <<'EOD'
chapter 8.cbz
chapter 1.3.cbz
_23 (sec 2).cbz
chapter 00009.cbz
chap 0000112.5.cbz
chap 04567.cbz
EOD

The first command strips everything before the first number and prepends zeros to ensure there are at least three digits.
The second command strips off everything after the number. (This may leave a trailing period that is not part of the number as the code defines a number to be any sequence of digits and periods).
The third command deletes any trailing periods and adds the desired suffix.
The final command removes the longest run of leading zeroes that leaves (at least) three digits (I added an extra test case to demonstrate) and adds the desired prefix.

Result of running this would be:

Chapter 008.cbz
Chapter 001.3.cbz
Chapter 023.cbz
Chapter 009.cbz
Chapter 112.5.cbz
Chapter 4567.cbz

edited Feb 14 '22 at 12:47

answered Feb 14 '22 at 01:38

jhnc

11,310
1
9
26

You're code seems to work the best so far, but I implemented it like this `s=$(echo $name | sed 's/^[^0-9]*/000/' | sed 's/[^0-9.].*$//' | sed 's/\.*$/.cbz/' | sed 's/^0*$[0-9]\{3\}$/Chapter \1/')` Is the EOD part required? I get a warning: `warning: here-document at line 6 delimited by end-of-file (wanted 'EOD')` when I try to use it. – Exodos Feb 14 '22 at 15:41
It is inefficient to call sed four times for every line of input. If your input is line-delimited, you can just pipe it all into sed once and then split it afterwards. My use of `<<'EOD'`...`EOD` is just a way to pipe input into sed for this demonstration. (See: [here documents](https://en.wikipedia.org/wiki/Here_document#Unix_shells)). – jhnc Feb 14 '22 at 16:26
Ah no my input is not line delimited. The segment of the code I gave will be inside of a for loop so `$name` will come from that, the name itself is coming from the filenames in a directory. That's why I implemented your code like I mentioned in the first comment. – Exodos Feb 14 '22 at 16:49
It's still inefficient to call sed four times instead of just once per iteration. The bash method you selected is probably better for that case. – jhnc Feb 14 '22 at 18:15

score 1 · Answer 5 · answered Feb 14 '22 at 12:08

1

Another 1 liner sed command:

Testing file input.1.txt

1) chapter 8.cbz   
2) chapter 1.3.cbz 
3) _23 (sec 2).cbz 
4) chapter 00009.cbz
5) chap 0000112.5.cbz

sed command

sed -E '{s/(^[^ ]*)([^[:digit:]]+)([[:digit:]]+[\. ]?[[:digit:]]*)([\. ].*$)/000\3/;s/([[:digit:]]+)([[:digit:]]{3})(.*$)/Chapter \2\3.cbz/}' input.1.txt

output

Chapter 008.cbz
Chapter 001.3.cbz
Chapter 023.cbz
Chapter 009.cbz
Chapter 112.5.cbz

answered Feb 14 '22 at 12:08

Dudi Boy

4,551
1
15
30

You're code works perfectly. – Exodos Feb 14 '22 at 15:49

score 0 · Answer 6 · answered Feb 14 '22 at 12:35

0

This might work for you (GNU sed):

sed -E 's/\b0+(0\.)?/\1/' file

Remove leading zeroes but leave an optional zero.

answered Feb 14 '22 at 12:35

potong

55,640
6
51
83