0

I have a XML/HTML (epub) document which has in the content < > instead of " " for citations. Is there a possibility to replace only the content < > and leave the <tags> untouched with some regular expression?

Zombo
  • 1
  • 62
  • 391
  • 407
idotter
  • 17
  • 1

1 Answers1

1

You should not use Regex to parse XML

Your question is not entirely clear, but it sounds like your XML has some text values with < and > in them that you want to change to quotes. This can be done fairly easily with an XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="@* | *">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="text()">
    <xsl:value-of select="translate(., '&lt;&gt;', '&quot;&quot;')"/>
  </xsl:template>
</xsl:stylesheet>

When run on this input:

<root>
  <item>And he said &lt;hello!&gt;.</item>
  <item>&lt;hello!&gt;, he said</item>
  <section>
    <content>&lt;What's up&gt;</content>
  </section>
</root>

it produces:

<root>
  <item>And he said "hello!".</item>
  <item>"hello!", he said</item>
  <section>
    <content>"What's up"</content>
  </section>
</root>

Is there any risk that the text in your document could contain <s and >s that you don't want to convert into quotes?

Community
  • 1
  • 1
JLRishe
  • 99,490
  • 19
  • 131
  • 169