Here is a generic solution that correctly processes any set of lines having the specified format -- even if there are different number of underscores on each line and the first "name" isn't the same on all lines:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:my="my:my" exclude-result-prefixes="my xs">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vLines" select="tokenize(/*, '\r?\n')[.]"/>
<xsl:variable name="vPass1">
<t>
<xsl:apply-templates mode="pass1"/>
</t>
</xsl:variable>
<xsl:template match="/*" mode="pass1">
<xsl:for-each select="$vLines">
<xsl:sequence select="my:makeTree(normalize-space(.))"/>
</xsl:for-each>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates select="$vPass1" mode="pass2"/>
</xsl:template>
<xsl:function name="my:makeTree">
<xsl:param name="pLine"/>
<xsl:variable name="vName" select="substring-before($pLine, '_')"/>
<xsl:choose>
<xsl:when test="$vName">
<xsl:element name="{$vName}">
<xsl:sequence select="my:makeTree(substring-after($pLine, '_'))"/>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:element name=
"{normalize-space(substring-before($pLine, '='))}">
<xsl:sequence select="substring-after($pLine, '=')"/>
</xsl:element>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:function name="my:group">
<xsl:param name="pNodes" as="node()*"/>
<xsl:for-each-group select="$pNodes[self::*]" group-by="name()">
<xsl:element name="{name()}">
<xsl:for-each select="current-group()">
<xsl:sequence select="my:group(node())"/>
</xsl:for-each>
</xsl:element>
</xsl:for-each-group>
<xsl:copy-of select="$pNodes[not(self::*)]"/>
</xsl:function>
<xsl:template match="*[not(my:path(.) = preceding::*/my:path(.))]" mode="pass2">
<xsl:copy>
<xsl:apply-templates select="//*[my:path(.) = my:path((current()))]/node()"
mode="pass2"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*" mode="pass2"/>
<xsl:template match="/*" mode="pass2" priority="3">
<xsl:apply-templates mode="pass2"/>
</xsl:template>
<xsl:function name="my:path" as="xs:string">
<xsl:param name="pElement" as="element()"/>
<xsl:sequence select=
"string-join($pElement/ancestor-or-self::*/name(.), '/')"/>
</xsl:function>
</xsl:stylesheet>
when this transformation is applied on the following XML document (the given lines, wrapped into a top element to make this a well-formed XML document):
<t>
heading1_sub1_element1 = data1
heading1_sub1_element2 = data2
heading1_sub1_element3 = data3
heading1_sub2_element1 = data4
heading1_sub2_element2 = data5
heading1_sub2_element3 = data6
</t>
the wanted, correct result is produced:
<heading1>
<sub1>
<element1> data1</element1>
<element2> data2</element2>
<element3> data3</element3>
</sub1>
<sub2>
<element1> data4</element1>
<element2> data5</element2>
<element3> data6</element3>
</sub2>
</heading1>
When applying the same transformation to this, much more complicated XML document:
<t>
heading1_sub1_element1 = data1
heading1_sub1_element2 = data2
heading1_sub1_element3 = data3
heading1_sub2_element1 = data4
heading1_sub2_element2 = data5
heading1_sub2_element3 = data6
heading2_sub1_sub2_sub3 = data7
heading2_sub1_sub2_sub3_sub4 = data8
heading2_sub1_sub2 = data9
heading2_sub1 = data10
heading2_sub1_sub2_sub3 = data11
</t>
we again get the correct, wanted result:
<heading1>
<sub1>
<element1> data1</element1>
<element2> data2</element2>
<element3> data3</element3>
</sub1>
<sub2>
<element1> data4</element1>
<element2> data5</element2>
<element3> data6</element3>
</sub2>
</heading1>
<heading2>
<sub1>
<sub2>
<sub3>
data7
<sub4> data8</sub4>
data11
</sub3>
data9
</sub2>
data10
</sub1>
</heading2>
Explanation:
This is a two-pass processing:
- In pass1 we convert the input to an temporary tree that (in the case of the first XML document above) looks like:
.....
<t>
<heading1>
<sub1>
<element1> data1</element1>
</sub1>
</heading1>
<heading1>
<sub1>
<element2> data2</element2>
</sub1>
</heading1>
<heading1>
<sub1>
<element3> data3</element3>
</sub1>
</heading1>
<heading1>
<sub2>
<element1> data4</element1>
</sub2>
</heading1>
<heading1>
<sub2>
<element2> data5</element2>
</sub2>
</heading1>
<heading1>
<sub2>
<element3> data6</element3>
</sub2>
</heading1>
</t>
.2. In the second pass we perform a specific kind of grouping so that we produce the wanted result.
Note: In this solution we access the input strings as the only text node child of the only element in an XML document. This infact isn't necessary and I have done so only for convenience. We can read the strings from an external text file using the standard XSLT 2.0 function unparsed-text()
.