Possible Duplicate:
Extract data from HTML table with BASH script
I have an html file that contains the following content. I want to use sed to remove all the content (multiline) between the patterns < script ..... >
and </script>
and leave the rest as it is. I also want to remove the tags.
Any help would be appreciated. thanks! I tried both of the following but with no luck.
cat test.html | tr -d '\n' | sed 's/< script.*<\/script>//g' > output.txt
and
sed '/< script/,/<\/script>/d' test.html > output.txt
don't touch this.
this is not to be removed < script bla bla> this is to be
removed. < /script> this is going to
stay < script bla bla bla bla bla> remove this
and this
and this < /script> and this stays as is.
this too.