1

OK upfront, I'm a newb on PHP and Java, however trying to refresh my coding after journeying into management for the past ten years.

I have a table in the form:

heading1_sub1_element1 = data1
heading1_sub1_element2 = data2
heading1_sub1_element3 = data3
heading1_sub2_element1 = data4
heading1_sub2_element2 = data5
heading1_sub2_element3 = data6

Using the awesome example at Tony Marsden's site I have been able to get the table to extract the data into the form:

<table>
    <heading1_sub1_element1>data1</heading1_sub1_element1>
    <heading1_sub1_element2>data2</heading1_sub1_element2>
    <heading1_sub1_element3>data3</heading1_sub1_element3>
    <heading1_sub2_element1>data4</heading1_sub2_element1>
    <heading1_sub2_element2>data5</heading1_sub2_element2>
    <heading1_sub2_element3>data6</heading1_sub2_element3>
</table>

However what I would like to get to is:

<heading1>
    <sub1>
        <element1>Data1</element1>
        <element2>Data2</element2>
        <element3>Data3</element3>
    </sub1>
    <sub2>
        <element1>Data4</element1>
        <element2>Data5</element2>
        <element3>Data6</element3>
    </sub2>
</heading1>

Does anyone have any idea on how to get the data into that format? will I need to use XSLT, or can PHP do this directly?

My only reason for doing this is that the XML looks a whole load better.

Thanks in advance, and any assistance will be greatfully recieved.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
btg_1967
  • 79
  • 1
  • 13
  • Is the heading always ``, or is there a number of headings that must be enclosed in a root node? – Borodin May 23 '12 at 16:14

3 Answers3

2

Here is a generic solution that correctly processes any set of lines having the specified format -- even if there are different number of underscores on each line and the first "name" isn't the same on all lines:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:my="my:my" exclude-result-prefixes="my xs">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:variable name="vLines" select="tokenize(/*, '\r?\n')[.]"/>

 <xsl:variable name="vPass1">
  <t>
   <xsl:apply-templates mode="pass1"/>
  </t>   
 </xsl:variable>


 <xsl:template match="/*" mode="pass1">
  <xsl:for-each select="$vLines">
   <xsl:sequence select="my:makeTree(normalize-space(.))"/>
  </xsl:for-each>
 </xsl:template>

 <xsl:template match="/">
  <xsl:apply-templates select="$vPass1" mode="pass2"/>
 </xsl:template>

 <xsl:function name="my:makeTree">
  <xsl:param name="pLine"/>

  <xsl:variable name="vName" select="substring-before($pLine, '_')"/>

  <xsl:choose>
    <xsl:when test="$vName">
      <xsl:element name="{$vName}">
        <xsl:sequence select="my:makeTree(substring-after($pLine, '_'))"/>
      </xsl:element>
    </xsl:when>
    <xsl:otherwise>
     <xsl:element name=
       "{normalize-space(substring-before($pLine, '='))}">
       <xsl:sequence select="substring-after($pLine, '=')"/>
     </xsl:element>
    </xsl:otherwise>
  </xsl:choose>
 </xsl:function>

 <xsl:function name="my:group">
  <xsl:param name="pNodes" as="node()*"/>

  <xsl:for-each-group select="$pNodes[self::*]" group-by="name()">
    <xsl:element name="{name()}">
      <xsl:for-each select="current-group()">
         <xsl:sequence select="my:group(node())"/>
      </xsl:for-each>
    </xsl:element>
  </xsl:for-each-group>
  <xsl:copy-of select="$pNodes[not(self::*)]"/>
 </xsl:function>

  <xsl:template match="*[not(my:path(.) = preceding::*/my:path(.))]" mode="pass2">
  <xsl:copy>
   <xsl:apply-templates select="//*[my:path(.) = my:path((current()))]/node()"
        mode="pass2"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="*" mode="pass2"/>

 <xsl:template match="/*" mode="pass2" priority="3">
   <xsl:apply-templates mode="pass2"/>
 </xsl:template>

 <xsl:function name="my:path" as="xs:string">
  <xsl:param name="pElement" as="element()"/>

  <xsl:sequence select=
   "string-join($pElement/ancestor-or-self::*/name(.), '/')"/>
 </xsl:function>

</xsl:stylesheet>

when this transformation is applied on the following XML document (the given lines, wrapped into a top element to make this a well-formed XML document):

<t>
    heading1_sub1_element1 = data1
    heading1_sub1_element2 = data2
    heading1_sub1_element3 = data3
    heading1_sub2_element1 = data4
    heading1_sub2_element2 = data5
    heading1_sub2_element3 = data6
</t>

the wanted, correct result is produced:

<heading1>
   <sub1>
      <element1> data1</element1>
      <element2> data2</element2>
      <element3> data3</element3>
   </sub1>
   <sub2>
      <element1> data4</element1>
      <element2> data5</element2>
      <element3> data6</element3>
   </sub2>
</heading1>

When applying the same transformation to this, much more complicated XML document:

<t>
    heading1_sub1_element1 = data1
    heading1_sub1_element2 = data2
    heading1_sub1_element3 = data3
    heading1_sub2_element1 = data4
    heading1_sub2_element2 = data5
    heading1_sub2_element3 = data6
    heading2_sub1_sub2_sub3 = data7
    heading2_sub1_sub2_sub3_sub4 = data8
    heading2_sub1_sub2 = data9
    heading2_sub1 = data10
    heading2_sub1_sub2_sub3 = data11
</t>

we again get the correct, wanted result:

<heading1>
   <sub1>
      <element1> data1</element1>
      <element2> data2</element2>
      <element3> data3</element3>
   </sub1>
   <sub2>
      <element1> data4</element1>
      <element2> data5</element2>
      <element3> data6</element3>
   </sub2>
</heading1>
<heading2>
   <sub1>
      <sub2>
         <sub3>
            data7
            <sub4> data8</sub4>
            data11
         </sub3>
         data9
      </sub2>
      data10
  </sub1>
</heading2>

Explanation:

This is a two-pass processing:

  1. In pass1 we convert the input to an temporary tree that (in the case of the first XML document above) looks like:

.....

<t>
   <heading1>
      <sub1>
         <element1> data1</element1>
      </sub1>
   </heading1>
   <heading1>
      <sub1>
         <element2> data2</element2>
      </sub1>
   </heading1>
   <heading1>
      <sub1>
         <element3> data3</element3>
      </sub1>
   </heading1>
   <heading1>
      <sub2>
         <element1> data4</element1>
      </sub2>
   </heading1>
   <heading1>
      <sub2>
         <element2> data5</element2>
      </sub2>
   </heading1>
   <heading1>
      <sub2>
         <element3> data6</element3>
      </sub2>
   </heading1>
</t>

.2. In the second pass we perform a specific kind of grouping so that we produce the wanted result.

Note: In this solution we access the input strings as the only text node child of the only element in an XML document. This infact isn't necessary and I have done so only for convenience. We can read the strings from an external text file using the standard XSLT 2.0 function unparsed-text().

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
1

Personally, unless you are really tied to first version of the XML you made, I would just go from the original text file format and convert the whole thing in php to create the XML you wanted. Yes you could use XSL to convert from the first to the second, but really, it is a much simpler process to split those original strings into key value pairs, and then either as a regular expresion or a string split, split the string on the '_' character and use that as an XML structure. If you use

$vals = explode("_", $string_input);

on just the key, that would give you for the first one:

$vals[0] = "heading1";
$vals[1] = "sub1";
$vals[2] = "element1";

which you could use to make the structure you want

I would normally never advise someone to make an XML structure with a string (as you run into encoding issues), but if you are sure you wouldn't, just output it as a string (or as the other answer says, simpleXML).

Woody
  • 5,052
  • 2
  • 22
  • 28
0

I'm pretty sure you'll have to do some string operations to parse out the node names that you want, but after that look into PHP's simpleXML. Here's a good answer that shows how to use it.

Community
  • 1
  • 1
GDP
  • 8,109
  • 6
  • 45
  • 82