0

hello I got a html_body of a xml email and want to parse it like this (I'm using XSLT 1.0)

        <body_html><html dir="ltr">
            <head>
                <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
                <style type="text/css" id="owaParaStyle"></style>
            </head>
            <body fpstyle="1" ocsi="0">
                <div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;">Hello alfjskfslfkjsjsf
                    <div>Attr A: Hello my name is </div>
                    <div>Attr B: ABCXYZ </div>
                    <div>Attr C: 5 </div>
                    <div>Attr D: Mr.ABC</div>
                    <div>Thank you so much</div>
                </div>
            </body>
            </html>
        </body_html>

The final xml I want

        <body_html>
            <AttrA> Hello my name is </AttrA>
            <AttrB> ABCXYZ </AttrB>
            <AttrC> 5 </AttrC>
            <AttrD> Mr.ABC </AttrD>
        </body_html>

I tried with something like this but it is not working

    <xsl:template match="body_html">
        <xsl:param name="text" select="." />
        <xsl:param name="AttrA" select="AttrA" />
        <xsl:param name="separator" select="':'" />
        <xsl:for-each select="div">
            <xsl:if test="contains($text,$AttrA)">
                <xsl:attribute name="AttrA">
                    <xsl:value-of select="substring-after($text,$separator)" />
                </xsl:attribute>
            </xsl:if>
        </xsl:for-each>
    </xsl:template>

Anyway to do it? Or keyword, article that I can refer? Thank you so much

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
Anh Pham
  • 15
  • 3
  • Your final xml is not correct. Element-names cannot contain white-space chars. See : https://stackoverflow.com/questions/2519845/how-to-check-if-string-is-a-valid-xml-element-name – Siebe Jongebloed Aug 29 '22 at 10:25
  • I've edited the xml format I want and the xslt file – Anh Pham Aug 29 '22 at 10:45
  • I have rolled your question back to what it was when I answered it. Please post a new question with your new (and completely different) problem. Also make sure we know which version of XSLT you can use. – michael.hor257k Aug 29 '22 at 18:58
  • See also: https://stackoverflow.com/a/32473081/3016153 – michael.hor257k Aug 29 '22 at 19:03
  • hi @michael.hor257k I've posted new question here https://stackoverflow.com/questions/73536727/parse-encode-xml-to-xml-by-xslt-3-0 so in the link you sent, if I want to use xslt 1.0, I have to decode to new file before using your code? and to not create a new file, i have to use xslt 3.0? – Anh Pham Aug 30 '22 at 03:16

1 Answers1

0

It's not clear what in your example is constant and what is just an example. Perhaps you want to do:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="/body_html">
    <body_html>
        <xsl:variable name="div" select="html/body/div/div" />
        <AttrA>
            <xsl:value-of select="substring-after($div[starts-with(., 'Attr A:')], ':')" />
        </AttrA>
        <AttrB>
            <xsl:value-of select="substring-after($div[starts-with(., 'Attr B:')], ':')" />
        </AttrB>
        <AttrC>
            <xsl:value-of select="substring-after($div[starts-with(., 'Attr C:')], ':')" />
        </AttrC>
        <AttrD>
            <xsl:value-of select="substring-after($div[starts-with(., 'Attr D:')], ':')" />
        </AttrD>
    </body_html>
</xsl:template>

</xsl:stylesheet>

Or maybe a more generic approach could work for you:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="/body_html">
    <body_html>
        <xsl:for-each select="html/body/div/div[contains(., ':')]">
            <xsl:element name="{translate(substring-before(., ':'), ' ', '')}">
                <xsl:value-of select="substring-after(., ':')" />
            </xsl:element>
        </xsl:for-each>
    </body_html>
</xsl:template>

</xsl:stylesheet>

Note the difference between attribute and element.

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • Thank you @michael.hor257k, because my target xml is encoded, so I tried to use doe but it is not working. I also uploaded the full xml and xslt. Could you take a look? – Anh Pham Aug 29 '22 at 18:49