0

I need to convert html based structure to xml document, based on attribute value. Below I mentioned the input structure.

<body>
      <p class='h1'>the fisr A</p>
      <p class='txt'>one</p>
      <p>tow</p>

      <p class='h2' status='remove'></p>
      <p class='h3'>the sec sec B</p>
      <p class='txt'>the next text</p>

      <p class='h3'>the fisr C</p>
      <p class='txt'>four</p>
      <p class='txt'>five</p>

      <p class='h1' status="remove">the seccond A</p>
      <p class='txt'>the seccond txt</p>

      <p class='h2'>the second B</p>
      <p class='txt'>six</p>
      <p class='txt'>seven</p>
      <p class='h1' status="remove">the third A</p>
      <p class='txt'>eight</p>
      <p class='h2' status="remove">the third A</p>
      <p class='h3'>the third A</p>
      <p class='txt'>the third A</p>
   </body>

My expected output is as given below. Here I need to group the elements based on h1, h2, h3. But the condition is after grouping the elements, we have to remove elements that have attribute status with value 'remove'.

<book>
   <sectionA>
      <title>the fisr A</title>
      <p xmlns="http://www.w3.org/1999/xhtml" class="txt">one</p>
      <p xmlns="http://www.w3.org/1999/xhtml">tow</p>
         <sectionC>
            <title>the sec sec B</title>
            <p xmlns="http://www.w3.org/1999/xhtml" class="txt">the next text</p>
         </sectionC>
         <sectionC>
            <title>the fisr C</title>
            <p xmlns="http://www.w3.org/1999/xhtml" class="txt">four</p>
            <p xmlns="http://www.w3.org/1999/xhtml" class="txt">five</p>
         </sectionC>
   </sectionA>

      <sectionB>
         <title>the second B</title>
         <p xmlns="http://www.w3.org/1999/xhtml" class="txt">six</p>
         <p xmlns="http://www.w3.org/1999/xhtml" class="txt">seven</p>
      </sectionB>

         <sectionC>
            <title>the third A</title>
            <p xmlns="http://www.w3.org/1999/xhtml" class="txt">the third A</p>
         </sectionC>
</book>

I have tried by using the below xslt. I processed the grouping inside the variable and then try to remove heading that have 'status' attribute. But it is working.

 <xsl:template match="body">
      <xsl:variable name="sequence">
      <book>
        <xsl:for-each-group select="p" group-starting-with="p[@class='h1']">
          <sectionA>
            <xsl:copy-of select="@*"></xsl:copy-of>
            <title>
              <xsl:value-of select="node()"/>
            </title>
            <xsl:for-each-group select="current-group() except ." group-starting-with="p[@class='h2']">
              <xsl:choose>
                <xsl:when test="self::p[@class='h2']">
                  <sectionB>
                    <xsl:copy-of select="@*"></xsl:copy-of>
                    <title>
                      <xsl:value-of select="node()"/>
                    </title>
                    <xsl:for-each-group select="current-group() except ." group-starting-with="p[@class='h3']">
                      <xsl:choose>
                        <xsl:when test="self::p[@class='h3']">
                          <sectionC>
                            <xsl:copy-of select="@*"></xsl:copy-of>
                            <title>
                              <xsl:value-of select="node()"/>
                            </title>
                            <xsl:apply-templates select="current-group() except ."></xsl:apply-templates>
                          </sectionC>
                        </xsl:when>
                        <xsl:otherwise>
                          <xsl:apply-templates select="current-group()"></xsl:apply-templates>
                        </xsl:otherwise>
                      </xsl:choose>
                    </xsl:for-each-group>
                  </sectionB>
                </xsl:when>
                <xsl:otherwise>
                  <xsl:apply-templates select="current-group()"></xsl:apply-templates>
                </xsl:otherwise>
              </xsl:choose>
            </xsl:for-each-group>
          </sectionA>
        </xsl:for-each-group>
      </book>
      </xsl:variable>
      <xsl:variable name="modifiedseq">
        <xsl:apply-templates select="$sequence/node()"></xsl:apply-templates>
      </xsl:variable>
      <xsl:apply-templates select="$modifiedseq"></xsl:apply-templates>
    </xsl:template>

    <xsl:template match="p">
      <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:apply-templates select="node()"/>
      </xsl:copy>
    </xsl:template>

The grouping is in order h1, h2, h3 order. Suppose if there is attribute status='remove' in h2, then the sequence is h1, h2. Please someone try to help me.

Reegan
  • 49
  • 6
  • It doesn't seem that `

    the seccond A

    ` has been removed, you have transformed it into a `sectionA` with a `the seccond A` child element. So I don't understand that requirement about removing based on that attribute. As for the grouping, it seems a nested or recursive XSLT 2 or 3 `for-each-group group-starting-with` e.g. `
    – Martin Honnen Jul 04 '18 at 13:11
  • See https://stackoverflow.com/questions/11693413/using-xslfor-each-group/11701272#11701272 for a similar grouping problem. – Martin Honnen Jul 04 '18 at 13:15
  • yes you are correct. Now I have edited the content. I have done some corrections mistake in my output. – Reegan Jul 04 '18 at 13:33
  • Could someone may solve this – Reegan Jul 04 '18 at 13:44
  • It is still not clear to me what the `status="remove"` implies exactly. At the end you have a `

    the third A

    ` followed by a `

    the third A

    `. If we group on `@class = 'h2'` to wrap the remaining content (including that `p class='h3'`) in a section but then delete the section based on the `status="remove"`, why does the `p class='h3'` still appear in the output? And the sentence "Suppose if there is attribute status='remove' in h2, then the sequence is h1, h2" is also not clarifying that requirement, seems to contradict the requirement to remove.
    – Martin Honnen Jul 04 '18 at 14:30
  • I have updated the xslt I used for this. Please refer this. Also I need the grouping order in sectionA, sectionB, sectionC. But the condition is if there is a attribute in SectionB like satus='remove', then the order should sectionA, sectionC. – Reegan Jul 05 '18 at 06:44

0 Answers0