I have a XML/HTML (epub) document which has in the content < >
instead of " "
for citations. Is there a possibility to replace only the content < >
and leave the <tags>
untouched with some regular expression?
Asked
Active
Viewed 226 times
0
-
2Can you show an example? – Brad Feb 07 '13 at 19:02
-
1Just to be sure, do they look like `«`? – Blender Feb 07 '13 at 19:03
-
1This question is not clear. Please specify with an example. – jurgenreza Feb 07 '13 at 19:29
1 Answers
1
You should not use Regex to parse XML
Your question is not entirely clear, but it sounds like your XML has some text values with <
and >
in them that you want to change to quotes. This can be done fairly easily with an XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="@* | *">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select="translate(., '<>', '""')"/>
</xsl:template>
</xsl:stylesheet>
When run on this input:
<root>
<item>And he said <hello!>.</item>
<item><hello!>, he said</item>
<section>
<content><What's up></content>
</section>
</root>
it produces:
<root>
<item>And he said "hello!".</item>
<item>"hello!", he said</item>
<section>
<content>"What's up"</content>
</section>
</root>
Is there any risk that the text in your document could contain <
s and >
s that you don't want to convert into quotes?