In order to match across lines, you must instruct sed
to read the whole file at once.
With GNU sed
(Linux) v4.2.2+, the simplest way to do that is to use -z
(whose purpose is to read NUL
-separated records; in the absence of embedded NUL
s, the entire file is read).
Also, given your unescaped use of (
and )
as metacharacters, you must activate support for extended regular expressions via the -r
option, although you don't strictly need that, because (.|\n*)
(which is equivalent to .*
) must be replaced with [^<]*
in order to potentially match multiple <style>
elements individually (.*
, because sed
regexes are greedy, would match everything up until the last </style>
tag in the file, which would malfunction with multiple elements).
sed -z -r -i 's#<style type="text/css">[^<]*</style>\n?##g' 1.htm
Note that I've appended \n?
to the regex to ensure that no empty line is left behind by the replacement.
Use of unescaped ?
also requires -r
.
Since you've chosen #
as the s
delimiter, you needn't \
-escape /
chars. in the regex.
With older GNU sed
versions, you can use a loop (:a;$!{N;ba}
) to read the entire file at once:
sed -r -i ':a;$!{N;ba}; s#<style type="text/css">[^<]*</style>\n?##g' 1.htm
Generally, for a more robust solution, use an HTML/XML-aware tool such as xsltproc
(see below).
Robust solution using XSLT via xsltproc
:
xsltproc
is a third-party utility that comes with macOS and some Linux distributions (e.g., Fedora), and can easily be installed on others (e.g., on Ubuntu, with sudo apt-get install xsltproc
).
With the --html
option, it is capable of applying XSLT-based transformations to HTML documents too, not just to XML documents.
Here's a sample bash
-based solution that demonstrates creating a copy of an HTML document with all <style>
elements removed, gratefully adapted from this answer:
# Create a simple sample HTML document with 2 <style> elements at different
# levels of the DOM and save it as "file.html"
cat > file.html <<'EOF'
<html>
<head></head>
<body>
<style type="text/css">
* {
border: 1 solid black;
}
</style>
<p foo='bar'>
abc def
<style type="text/css">
* {
border: 2 dashed blue;
}
</style>
</p>
</body>
</html>
EOF
xsltproc
can then apply an XSLT template to the HTML file (normally, such a template is provided as a file as well, but given its brevity, I'm constructing it in memory and providing it like a file via a bash
process substitution (<(...)
)):
# Define the XSLT template that copies all nodes in the document except those
# named "style".
# For an explanation, see https://stackoverflow.com/a/322079/45375
template='<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="style"/>
</xsl:stylesheet>'
# Invoke xsltproc with the template and the input file.
# --html tells xlstproc to process the file as HTML, both on input and on output.
xsltproc --html <(echo "$template") file.html
The above yields (note how both <style>
elements were removed):
<html>
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head>
<body>
<p foo="bar">
abc def
</p>
</body>
</html>
To replace the input file with the modified copy (to emulate sed -i
), use something like:
xsltproc --html <(echo "$template") file.html > /tmp/file.$$ && mv /tmp/file.$$ file.html