Remove BASH in-line comments, but only after arrays and the first 11 occurrances

Question

What I have

I over-documented my meta statements in my scripts. There are many. They all start with something like this:

#!/bin/bash

# Title and other notes information
## I may have other lines here, or not

ruler=( monarch ) # Type of leader
kingdom=( "Island" ) # Type of territory
zipcode=( 90210 ) # Standard, 3-12 digits, hyphens allowed
datatype=( 0-9- ) # Datatype
favoritepie=( "Cherry" ) # A happy memory
aoptions=( "Home address" "Work address" "Mobile" ) # List custom options
boptions=(  ) # List secondary options
aopttypes=( string string phonenum ) # Corresponding datatypes for options
bopttypes=(  ) # Corresponding datatypes for secondary options
sourced=(  ) # Sourced text in this script, such as settings
subscripts=( installusr ) # Valid BASH scripts that this script may call

...

# Script continues
outsidethewire=( "key 971" ) # Leave this comment here
somearray=( "sliced apples" "pie dough" ) # Mother's secret recipe

...I need it to become this...

#!/bin/bash

# Title and other notes information
## I may have other lines here, or not

ruler=( monarch )
kingdom=( "Island" )
zipcode=( 90210 )
datatype=( 0-9- )
favoritepie=( "Cherry" )
aoptions=( "Home address" "Work address" "Mobile" )
boptions=(  )
aopttypes=( string string phonenum )
bopttypes=(  )
sourced=(  )
subscripts=( installusr )

...

# Script continues
outsidethewire=( "key 971" ) # Leave this comment here
somearray=( "sliced apples" "pie dough" ) # Mother's secret recipe

There are only 11 lines like this and they are first like this. Any such comments later in the scripts must be left alone.
There might be comments before these lines, but the number of lines before the 11 arrays varies from script to script.
Every comment starts after the consistent pattern ) #...

What I need

I need to remove the comments after these array statements.

What I have

I can run this...

sed 's/ ) # .*/ )/' *

But, I want to limit that to only the first 11 occurrences per file.

From this answer I get a pattern to match the first single match, giving me this...

sed '0,/ ) # .*/s// )/' *

...but that only works for the first occurrence.

I could put it into a loop:

#!/bin/bash

counter=1

while [ "$counter" -le "11" ]; do

  sed '0,/ ) # .*/s// )/' *
  counter=$(expr $counter + 1)

done

Is that 'proper'?

This loop makes the assumption that all files will match evenly, running blankly for all files. If possible, I'd like the loop to run for each file, not for all files based on a counter. But, I'm not sure how to do that.

Is that the best way to do this? Or, is there a more "proper", fail-safe way using other Linux tools?

score 2 · Accepted Answer · answered Jun 15 '22 at 07:48

If I understand this correctly, you want to match comments after a closing bracket ) and delete the first 11 occurances in the file, regardless of if similar matches after a closing bracket occur.

Assume the contents of your file are;

$ cat input_file
#!/bin/bash

# Title and other notes information
## I may have other lines here, or not

ruler=( monarch ) # Type of leader
kingdom=( "Island" ) # Type of territory
zipcode=( 90210 ) # Standard, 3-12 digits, hyphens allowed
datatype=( 0-9- ) # Datatype
favoritepie=( "Cherry" ) # A happy memory
aoptions=( "Home address" "Work address" "Mobile" ) # List custom options
boptions=(  ) # List secondary options
aopttypes=( string string phonenum ) # Corresponding datatypes for options
bopttypes=(  ) # Corresponding datatypes for secondary options
sourced=(  ) # Sourced text in this script, such as settings
subscripts=( installusr ) # Valid BASH scripts that this script may call

# Title and other notes information
## I may have other lines here, or not

ruler=( monarch ) # Type of leader
kingdom=( "Island" ) # Type of territory
zipcode=( 90210 ) # Standard, 3-12 digits, hyphens allowed
datatype=( 0-9- ) # Datatype
favoritepie=( "Cherry" ) # A happy memory
aoptions=( "Home address" "Work address" "Mobile" ) # List custom options
boptions=(  ) # List secondary options
aopttypes=( string string phonenum ) # Corresponding datatypes for options
bopttypes=(  ) # Corresponding datatypes for secondary options
sourced=(  ) # Sourced text in this script, such as settings
subscripts=( installusr ) # Valid BASH scripts that this script may call

...

# Script continues

Using sed, match the lines of interest then, carry out the action on the first 11 lines that matched the condition.

$ sed '/) #/{1,+10s/#.*//}' input_file
#!/bin/bash

# Title and other notes information
## I may have other lines here, or not

ruler=( monarch )
kingdom=( "Island" )
zipcode=( 90210 )
datatype=( 0-9- )
favoritepie=( "Cherry" )
aoptions=( "Home address" "Work address" "Mobile" )
boptions=(  )
aopttypes=( string string phonenum )
bopttypes=(  )
sourced=(  )
subscripts=( installusr )

# Title and other notes information
## I may have other lines here, or not

ruler=( monarch ) # Type of leader
kingdom=( "Island" ) # Type of territory
zipcode=( 90210 ) # Standard, 3-12 digits, hyphens allowed
datatype=( 0-9- ) # Datatype
favoritepie=( "Cherry" ) # A happy memory
aoptions=( "Home address" "Work address" "Mobile" ) # List custom options
boptions=(  ) # List secondary options
aopttypes=( string string phonenum ) # Corresponding datatypes for options
bopttypes=(  ) # Corresponding datatypes for secondary options
sourced=(  ) # Sourced text in this script, such as settings
subscripts=( installusr ) # Valid BASH scripts that this script may call

...

# Script continues

Interesting - I presume if the regexp is invoked for a line number greater than the starting range, the range match is invoked and a subsequent match of the regexp then notes that line number and matches until the RHS of the range surpasses the noted line number and the desired lines following it. — potong, Jun 15 '22 at 13:06
@potong That is correct, a greater starting range can be specified and the replacement will start from that line instead replacing +10 plus the starting range in the process. — HatLess, Jun 15 '22 at 13:14
The same seems to apply to the RHS of addr1,+N range e.g `seq 10|sed -n '1~2{4,+1p}'` returns lines 5 and 7 so does `seq 10|sed -n '1~2{4,+2p}'` whereas `seq 10|sed -n '1~2{4,+3p}'` returns lines 5,7 and 9. — potong, Jun 15 '22 at 14:26
@potong It would seem you are executing something different from the above command. Consider this `seq 10|sed -n '1,5{3,+1p}'`. This is a range match, whereas with tilde, you are matching from the starting range and matching every other N line thereafter. — HatLess, Jun 15 '22 at 14:32
Yes, I was going above and beyond your solution to show the nitty gritty of how the mixing of regexp with range addr1,+N operate. These details are not, AFAIK documented. — potong, Jun 15 '22 at 18:11

tshiono · Answer 2 · 2022-06-15T04:23:29.500

If I'm understanding your requirements correctly, following will work:

#!/bin/bash

for file in *; do
    temp=$(mktemp tmp.XXXXXX)
    awk '
        /\)[[:space:]]*#/ && c++ < 11 {sub(/[[:space:]]*#.*/, "")}
        1
    ' "$file" > "$temp"
    mv -f -- "$file" "$file".O          # backup file
    mv -f -- "$temp" "$file"
done

If GNU awk is available, -i inplace option will work to overwrite the file instead of creating a temp file:

#!/bin/bash

for file in *; do
    gawk -i inplace -v inplace::suffix=.O '
        /\)[[:space:]]*#/ && c++ < 11 {sub(/[[:space:]]*#.*/, "")}
        1
    ' "$file"
done

Or simply:

#!/bin/bash

gawk -i inplace -v inplace::suffix=.O '
    /\)[[:space:]]*#/ && c++ < 11 {sub(/[[:space:]]*#.*/, "")}
    1
' *

Could you show the complete `awk`-GNU code also, then? – Jesse Jun 15 '22 at 04:14 — Jesse, Jun 15 '22 at 04:14

score 1 · Answer 3 · answered Jun 15 '22 at 22:16

This might work for you (GNU sed):

sed -E 'x;/x{11}/{x;b};x;/\) #.*/{s//)/;x;s/^/x/;x}' file

In essence, keep a counter in the hold space and if 11 comments of the required type have been removed, no further processing of a line is necessary.

Check the hold space counter and if it is 11, bail out.

Otherwise, if the line matches the required criteria, remove the comment and increment the counter in the hold space.

In all other cases, no processing is carried out.

Remove BASH in-line comments, but only after arrays and the first 11 occurrances

What I have

What I need

What I have

Is that 'proper'?

3 Answers3