2

this is my file.

...
</script>

<!--START: Google Analytics --->
<script type="text/javascript"
src="../src/goog/ga_body.js"></script>
<!--END: Google Analytics --->
</body>
</html>
...

how do I delete every thing <!--START: Google Analytics ---> and <!--END: Google Analytics ---> inclusively? So effectively this:

<!--START: Google Analytics --->
<script type="text/javascript"
src="../src/goog/ga_body.js"></script>
<!--END: Google Analytics --->

will be gone. and this will be left i.e. that is nothing, the 4 lines will be replaced with nothing.

</script>

    <nothing here 4 lines deleted>

    </body>
    </html>

I am looking at doing it in bash so maybe sed and awk might be my best bet, although python might be better.



EDIT1

This is something I have written before, but it is probably very poor coding, I will work off this find2PatternsAndDeleteTextInBetween.sh:

#HEre I want to find 2 patterns and delete whats in between 
#this example works 


#this is the 2 patterns I want to fine Start and End
#have to use some escape characters here for this to show properly
# have to use \n for it to appear in this format 
#<!-- Start of StatCounter Code for DoYourOwnSite -->
#  text would go here 
#<!-- End of StatCounter Code for DoYourOwnSite -->>

#b="<!-- Start of StatCounter Code for DoYourOwnSite -->"

#b2="<!-- End of StatCounter Code for DoYourOwnSite -->"

#p1="PATTERN-1"
#p2="PATTERN-2"
p1="<!-- Start of StatCounter Code for DoYourOwnSite -->"
p2="<!-- End of StatCounter Code for DoYourOwnSite -->"
fname="*.html"
num_of_files_pattern1=ls #grep $p1 fname


echo "fname(s) to apply the sed to:"
echo $fname
echo "num_of_files_pattern1 is:"
echo $num_of_files_pattern1

echo "Pattern1 is equal to:"
echo $p1

echo "Pattern2 is equal to:"
echo $p2

#this is current dir where the script is
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
echo "DIR is equal to:"
echo $DIR

#cd to the dir where I want to copy the files to:
cd "$DIR"

# this will find the pattern <\head> in all the .html files and place "This should appear before the closing head tag" this before it
# it will also make a backup with .bak extension 
#sed -i.bak '/<\\head>/i\This should appear before the closing head tag' *.html

echo "sed on the file"
# this does the head part
#sed '/PATTERN-1/,/PATTERN-2/d' *.txt # this works
#sed "/$p1/,/$p2/d" *.txt # this works
#sed "/$p1/,/$p2/d" $fname # this works 
sed -i.bak "/$p1/,/$p2/d" $fname # this works 


EDIT2

This is what i ended up with, but there is a more robust answer below:

# ------------------------------------------------------------------
# [author] find2PatternsAndDeleteTextInBetween.sh
#           Description
#           Here I want to find 2 patterns and delete what's in between 
#           this example works 
#
# EXAMPLE:
# this is the 2 patterns I want to find Start and End
# <!-- Start of StatCounter Code for DoYourOwnSite -->
#   text would go here 
# <!-- End of StatCounter Code for DoYourOwnSite -->>
#
# ------------------------------------------------------------------
p1="<!--START: Google Analytics --->"
p2="<!--END: Google Analytics --->"
fname=".html"
echo "fname(s) to apply the sed to:"
echo *"$fname"
echo -e "\n"
echo "Pattern1 is equal to:"
echo -e "$p1\n"
echo "Pattern2 is equal to:"
echo -e "$p2\n"
echo -e "PWD is: $PWD\n"
echo "sed on the file"
#sed '/PATTERN-1/,/PATTERN-2/d' *.txt # this works
#sed "/$p1/,/$p2/d" *.txt # this works
#sed "/$p1/,/$p2/d" $fname # this works 
sed -i.bak "/$p1/,/$p2/d" *"$fname" # this works 
HattrickNZ
  • 4,373
  • 15
  • 54
  • 98

3 Answers3

2

sed is for this task

$ sed -i'.bak' '/<!--START/,/<!--END/d' file

if you have other lines with similar tags add more of the pattern.

For multiple files, for example file1,..,file4

$ for f in file{1..4}; do sed -i'.bak' '/<!--START/,/<!--END/d' "$f"; done 
karakfa
  • 66,216
  • 7
  • 41
  • 56
2

Something to consider:

$ awk '/<!--(START|END): Google Analytics --->/{f=!f;next} !f' file
...
</script>

</body>
</html>
...
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

Judging by the script in your question it sounds like you already know how to use sed to remove the range of interest from a single file (sed -i.bak "/$p1/,/$p2/d" $fname), but are looking for a robust way to process multiple files in a script (assumes bash):

#!/usr/bin/env bash

# cd to the dir. in which this script is located.
# CAVEAT: Assumes that the script wasn't invoked through a *symlink*
#         located in a different dir.
cd -- "$(dirname -- "$BASH_SOURCE")" || exit

fpattern='*.html'     # specify source-file globbing pattern
shopt -s failglob     # make sure that globbing expands to nothing if nothing matches
fnames=( $fpattern )  # expand to matching files and store in array 
num_of_files_matching_pattern=${#fnames[@]} # count matching files
(( num_of_files_matching_pattern > 0 )) || exit # abort, if no files match

printf '%s\n%s\n' "Running from:" "$PWD"
printf '%s\n%s\n' "Pattern matching the files to process:" "$fpattern"
printf '%s\n%s\n' "# of matching files:" "$num_of_files_matching_pattern"

# Determine the range-endpoint-identifier-line regular expressions.
# CAVEAT: Make sure you escape any regular-expression metacharacters you want
#         to be treated as *literals*.
p1='^<!--START: Google Analytics --->$'
p2='^<!--END: Google Analytics --->$'

# Remove the range identified by its endpoints from all matching input files
# and save the original files with extension '.bak'
sed -i'.bak' "/$p1/,/$p2/d" "${fnames[@]}" || exit

As an aside: I suggest not using suffix .sh in your script filename:

  • The shebang line inside the file is sufficient to tell the system what shell/interpreter to pass the script to.

  • Not specifying as suffix leaves you free to change the implementation later (e.g., to Python), without breaking existing programs that rely on your scripts.

  • In the case at hand, assuming that use of bash is actually acceptable, .sh would be misleading, because its suggests a sh-features-only script.


Determining the running script's true directory, even when the script is invoked via a symlink located in a different directory:

  • If you can assume a Linux platform (or at least GNU readlink), use:

    dirname -- "$(readlink -e -- "$BASH_SOURCE")"
    
  • Otherwise, a more elaborate solution with a helper function is required - see this answer of mine.

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • tks, I like the robustnes compared with mine in edit2 above. many takeaways symlink, escape any regular-expression metacharacters you want to be treated as *literals*, not using suffix `.sh`, ++ – HattrickNZ Nov 01 '16 at 19:14
  • could you expand on the `symlink` as it might relate to what i want, as I currently have to put my script in the dir with all the files and then do a ./script.sh. Probably a whole other question but I would like to be able to run it from anywhere – HattrickNZ Nov 01 '16 at 19:17