I have an XML file as follows:
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="test.xslt"?>
<results>
<test name="sentence1">
<description href="#ömr">
ömr1, ämr1, ümr1 and pär1
</description>
</test>
<test name="sentence2" href="#pär2">
<description>
ömr2, ämr2, ümr2 and pär2
</description>
</test>
<test name="sentence3" href="#pär3">
<description>
ömr3, ämr3, ümr3 and pär3
</description>
</test>
</results>
Then here is the XSLT
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:b="http://www.froglogic.com/XML2"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xsl:output method="html" version="5.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="Summary/test">
<html>
<body>
<xsl:for-each select="//test">
<xsl:variable name="linkMe" select="@name"/>
<xsl:value-of select="description"/>
<a href="#{$linkMe}" >
<xsl:value-of select="$linkMe" />
</a>
<xsl:value-of select="description"/>
</xsl:for-each>
</body>
</html>
</xsl:template>
I want to convert the XML to an HTML file using Perl. But it's going to have not desired output although I have told Perl I want output as a UTF-8
.
The perl code is like this:
use strict;
use warnings;
use XML::LibXML;
use XML::Writer;
use XML::LibXSLT;
use XML::Parser;
use Encode qw( is_utf8 encode decode );
my $XML_File = "test2.xml";
my $XSLT_File = "test2.xslt";
my $HTML_File = "test2.html";
sub XML2HTML {
my $xml_parser = XML::LibXML->new('1.0', 'UTF-8');
my $xslt_parser = XML::LibXSLT->new('1.0', 'UTF-8');
my $xml = $xml_parser->parse_file($XML_File);
$xml->setEncoding('UTF-8');
my $xsl = $xml_parser->parse_file($XSLT_File);
my $stylesheet = $xslt_parser->parse_stylesheet($xsl);
my $results = $stylesheet->transform($xml);
my $output = $stylesheet->output_string($results);
$stylesheet->output_file($results, $HTML_File);
}
&XML2HTML($XML_File, $XSLT_File, $HTML_File);
Another question is how I could have UTF-8-BOM output as file? I searched the internet and could not find an exact answer. They all mention UTF-8 rather than UTF-8-BOM.
The HTML output seems unpleasant:
ömr1, ämr1, ümr1 and pär1 ömr2, ämr2, ümr2 and pär2 ömr3, ämr3, ümr3 and pär3
The encoding format in HTML is
Codepage 1252(Western)
and it is strange!