1

I got a small problem with my XSLT mapping.

I got a XML file as input with following data:

<?xml version="1.0" encoding="UTF-8"?>
<Request>
    <Query>
        <Parameter name="staat">OESTERREICH</Parameter>
    </Query>
</Request>

What I am trying is to get the state and set the special character for OE, UE, AE and SS.

The special characters:

OE = Ö
UE = Ü
AE = Ä
SS = ẞ

What I tried is the following XSLT Script/Mapping:'

<?xml version="1.0" encoding="UTF-8"?><?xe.source ../TemporaryFiles/Test_XML_1.xml#Request?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output media-type="text/xml" method="xml"></xsl:output>
    <xsl:template match="/">
        <root>
        <first_two_letters>
            <xsl:value-of select="substring(upper-case(/Request/Query/Parameter[@name='staat']),1,2)"></xsl:value-of>
        </first_two_letters>
        <xsl:choose>
            <xsl:when test="substring(/Request/Query/Parameter[@name='staat'],1,2)='OE'">
                <xsl:value-of select="concat(substring(replace(/Request/Query/Parameter[@name='staat2'], 'OE', 'Ö'),1,2), substring(replace(replace(replace(lower-case(/Request/Query/Parameter[@name='staat2']), 'ae', 'ä'), 'oe', 'ö'), 'ue', 'ü'),3))"></xsl:value-of>
            </xsl:when>
            <xsl:when test="substring(/Request/Query/Parameter[@name='staat'],1,2)='AE'">
                <xsl:value-of select="concat(substring(replace(/Request/Query/Parameter[@name='staat2'], 'AE', ''),1,2), substring(replace(replace(replace(lower-case(/Request/Query/Parameter[@name='staat2']), 'ae', 'ä'), 'oe', 'ö'), 'ue', 'ü'),3))"></xsl:value-of>
            </xsl:when>
            <xsl:when test="substring(/Request/Query/Parameter[@name='staat'],1,2)='UE'">
                <xsl:value-of select="concat(substring(replace(/Request/Query/Parameter[@name='staat2'], 'UE', 'Ü'),1,2), substring(replace(replace(replace(lower-case(/Request/Query/Parameter[@name='staat2']), 'ae', 'ä'), 'oe', 'ö'), 'ue', 'ü'),3))"></xsl:value-of>
            </xsl:when>
            <xsl:when test="substring(/Request/Query/Parameter[@name='staat'],1,2)='SS'">
                <xsl:value-of select="concat(substring(replace(/Request/Query/Parameter[@name='staat2'], 'SS', 'ẞ'),1,2), substring(replace(replace(replace(lower-case(/Request/Query/Parameter[@name='staat2']), 'ae', 'ä'), 'oe', 'ö'), 'ss', 'ß'),3))"></xsl:value-of>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="concat(substring(/Request/Query/Parameter[@name='staat2'],1,1), substring(replace(replace(replace(lower-case(/Request/Query/Parameter[@name='staat2']), 'ae', 'ä'), 'oe', 'ö'), 'ue', 'ü'),2))"></xsl:value-of>
            </xsl:otherwise>
        </xsl:choose>
        </root>
    </xsl:template>
</xsl:stylesheet>

The Script works so fare good but the out put is:

<root>
    <first_two_letters>OE</first_two_letters>
</root>

What I realy expected for my output is:

<root>
    <first_two_letters>OE</first_two_letters>
    Österreich
</root>

---EDIT: ---

After the small help form @Boldewyn in the commands, here is the working Code:

<?xml version="1.0" encoding="UTF-8"?><?xe.source ../TemporaryFiles/Test_XML_1.xml#Request?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output media-type="text/xml" method="xml"></xsl:output>
    <xsl:template match="/">
        <root>
        <first_two_letters>
            <xsl:value-of select="substring(upper-case(/Request/Query/Parameter[@name='staat']),1,2)"></xsl:value-of>
        </first_two_letters>
        <xsl:choose>
            <xsl:when test="substring(/Request/Query/Parameter[@name='staat'],1,2)='OE'">
                <xsl:value-of select="concat(substring(replace(/Request/Query/Parameter[@name='staat'], 'OE', 'Ö'),1,1), substring(replace(replace(replace(lower-case(/Request/Query/Parameter[@name='staat']), 'ae', 'ä'), 'oe', 'ö'), 'ue', 'ü'),2))"></xsl:value-of>
            </xsl:when>
            <xsl:when test="substring(/Request/Query/Parameter[@name='staat'],1,2)='AE'">
                <xsl:value-of select="concat(substring(replace(/Request/Query/Parameter[@name='staat'], 'AE', ''),1,1), substring(replace(replace(replace(lower-case(/Request/Query/Parameter[@name='staat']), 'ae', 'ä'), 'oe', 'ö'), 'ue', 'ü'),2))"></xsl:value-of>
            </xsl:when>
            <xsl:when test="substring(/Request/Query/Parameter[@name='staat'],1,2)='UE'">
                <xsl:value-of select="concat(substring(replace(/Request/Query/Parameter[@name='staat'], 'UE', 'Ü'),1,1), substring(replace(replace(replace(lower-case(/Request/Query/Parameter[@name='staat']), 'ae', 'ä'), 'oe', 'ö'), 'ue', 'ü'),2))"></xsl:value-of>
            </xsl:when>
            <xsl:when test="substring(/Request/Query/Parameter[@name='staat'],1,2)='SS'">
                <xsl:value-of select="concat(substring(replace(/Request/Query/Parameter[@name='staat'], 'SS', 'ẞ'),1,1), substring(replace(replace(replace(lower-case(/Request/Query/Parameter[@name='staat']), 'ae', 'ä'), 'oe', 'ö'), 'ue', 'ü'),2))"></xsl:value-of>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="concat(substring(/Request/Query/Parameter[@name='staat'],1,1), substring(replace(replace(replace(lower-case(/Request/Query/Parameter[@name='staat']), 'ae', 'ä'), 'oe', 'ö'), 'ue', 'ü'),2))"></xsl:value-of>
            </xsl:otherwise>
        </xsl:choose>
        </root>
    </xsl:template>
</xsl:stylesheet>

But now I gat the problem as told in the comments:

Israel = Isräl.

[2020-08-06 16:26]: What I mean is, they are some exceptions such as Israel or Lithuan, where the special characters should not take effect.

How can I solve this problem now?

NvrKill
  • 327
  • 2
  • 16
  • Look out for “Isräl” and “Litaün”! – Boldewyn Aug 06 '20 at 12:26
  • ahh now I see what you mean ^^ Is XSLT got some functions for something like that? ^^ – NvrKill Aug 06 '20 at 12:28
  • 1
    There is a typo in the `xsl:value-of`. Search for `[@name='staat2']`, which should be `[@name='staat']` (at least given your posted input document). – Boldewyn Aug 06 '20 at 12:32
  • 1
    now my output is ``` OE ÖSterreich ``` But I see the substrings are the cause of this problem xD Now I got the correct output. But how can I solve the problem with `Isräl` and/or `Litaün` – NvrKill Aug 06 '20 at 12:39
  • Which XSLT engine are you using? – Sebastien Aug 06 '20 at 12:57
  • to be honest? I don't know ^^ I only can say it's an java based engine where the errors say nothing. That's why I started to be on stackoverflow ^^ – NvrKill Aug 06 '20 at 13:05
  • 1
    You can try running those functions to see which engine you are running : https://stackoverflow.com/questions/25244370/how-can-i-check-which-xslt-processor-is-being-used-in-solr – Sebastien Aug 06 '20 at 13:27
  • It's an SAXON 9.1.0.2 from Saxonica Version 2.0 – NvrKill Aug 06 '20 at 13:31
  • As you've discovered, blindly replacing `ae` by `ä` isn't going to work. We're here to help you with XSLT coding problems, not with algorithms in natural language processing, but I think you're only going to be able to tackle this by word-for-word replacement using a dictionary of some kind. – Michael Kay Aug 06 '20 at 15:40

1 Answers1

1

Here's a way this could be done. (I don't speak German so not sure this works for all cases!) I have modified your input to add more test cases.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="2.0">

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:output method="xml" indent="yes"/>
  
  <xsl:template match="/">
    <root>
      <xsl:apply-templates select="Request/Query/Parameter[@name='staat']"/>
    </root>
  </xsl:template>

  <xsl:template match="Parameter">
      <xsl:variable name="text" select="."/>
      <xsl:variable name="upper2" select="upper-case(substring($text,1,2))"/>
      <first_two_letters>
        <xsl:value-of select="substring(upper-case($text),1,2)"></xsl:value-of>
      </first_two_letters>
      <!-- Replacement in first 2 characters -->
      <xsl:value-of select="if($upper2='OE' or $upper2='AE' or $upper2='UE' or $upper2='SE')
                            then replace(replace(replace(replace(upper-case(substring($text,1,2)),'OE','Ö'),'AE','Ä'),'UE','Ü'),'SS','ẞ')
                            else substring($text,1,2)"/>
      <!-- Replacement in remainging characters -->
      <xsl:value-of select="replace(replace(replace(lower-case(substring($text,3,string-length($text)-2)), 'ae', 'ä'), 'oe', 'ö'), 'ss', 'ß')"/>
  </xsl:template>
  
</xsl:stylesheet>

See it working here : https://xsltfiddle.liberty-development.net/3Mvnt3M

Sebastien
  • 2,672
  • 1
  • 8
  • 13
  • In your example Israel is turned to Isräl but this should not happen ^^ So the special characters shouldn't be arife in for example Israel and Litauen ^^ – NvrKill Aug 06 '20 at 14:20
  • In most cases yes, but there are exceptions such as Israel or Lithuania. – NvrKill Aug 06 '20 at 14:24
  • If there are not many of them you could maybe have a list to exclude those words from the replacement process? – Sebastien Aug 06 '20 at 14:26
  • I take a look for that but I think I got something on another system for that yea. – NvrKill Aug 06 '20 at 14:34
  • I hote fixed your xslt https://xsltfiddle.liberty-development.net/3Mvnt3M/2 now it runs like I want. Now I need to exclude the countries I want ^^ – NvrKill Aug 06 '20 at 15:24