0

I am looking for a way to replace a HTML tag with another, but keep the text.

I have a big HTML file, which contains:

<span class="desc e-font-family-cond">fork</span>

I want to replace <span> tag with <strong> tag:

<strong>fork</strong>

Tool doesn't really matter, but I am looking for a CLI way to do it.

I am not looking for a HTML processor, because input is a text file with some HTML code in it (not a clean/valid HTML) and I am manually working with the output (copy, modify, use later in its final place). I just want to save some time with the replace.

4 Answers4

1

Consider using Python and a tool like BeautifulSoup to handle HTML. Trying to parse HTML with other tools like sed or awk can lead to terrible places.

As an example:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<li><span class="desc e-font-family-cond">fork</span>')
for spanele in soup.findAll('span'):
    spanele.name = 'p'
html_string = str(soup)
print(html_string);

That's lightweight and pretty simple and the html is handled properly with a library that is specifically built to parse it.

JNevill
  • 46,980
  • 4
  • 38
  • 63
1

Don't use AWK for processing HTML files. If you can turn your HTML file into an XHTML file, you can use xsltproc for an XML transformation as follows:

trans.xsl file:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="xml" indent="yes" encoding="utf-8"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="span[@class='desc e-font-family-cond']">
    <strong><xsl:apply-templates/></strong>
  </xsl:template>

</xsl:stylesheet>

CLI command for invoking xsltproc, which has to be installed, obviously:

xsltproc trans.xsl file.html

The standard output of this command is the corrected HTML file as you want to have it.

Pierre François
  • 5,850
  • 1
  • 17
  • 38
1

Using sed:

sed 's,<\(\/\)\?span\(\s\)\?,<\1strong\2,g'

$ echo '<span class="desc e-font-family-cond">fork</span>' | sed 's,<\(\/\)\?span\(\s\)\?,<\1strong\2,g'
<strong class="desc e-font-family-cond">fork</strong>
nntrn
  • 416
  • 2
  • 7
1

I would use GNU sed for this task following way, let file.txt content be

<span class="desc e-font-family-cond">fork</span>

then

sed -e 's/<span[^>]*>/<strong>/g' -e 's/<\/span>/<\/strong>/g' file.txt

output

<strong>fork</strong>

Explanation: firstly replace span starting using <strong>, secondly replace span closing using </strong>.

Daweo
  • 31,313
  • 3
  • 12
  • 25