0

I am using Microsoft's XSLT processor to process a block of XML. I need to break into a series of subnodes using XSLT 1.0.

The input is very regular and of the form "###, ##:##> user text" I think I should be able to use that leading number and the time as some type of marker / delimiter.

An example might be:

<xmldocument>
    <Notes>422, 10:06> Test Note 1 422, 10:03> Test Note 2 </Notes>
</xmldocument>

In this case there are 2 notes:

  1. 422, 10:06> Test Note 1
  2. 422, 10:03> Test Note 2

The leading number can and will vary. So, it can't be used as a delimiter. I THINK one can use the comma and the greater than to locate the message.

The desired output is:

<xmldocument>
    <Notes>
        <Note>
            <NoteTime>10:06</NoteTime>
            <NoteText>Test Note 1 (422)</NoteText>
        </Note>
        <Note>
            <NoteTime>10:03</NoteTime>
            <NoteText>Test Note 2 (422)</NoteText>
        </Note>
     </Notes>
</xmldocument>

An example with one note:

<xmldocument>
    <Notes>999, 10:06> Test Note 1</Notes>
</xmldocument>

Would yield:

<xmldocument>
    <Notes>
        <Note>
            <NoteTime>10:06</NoteTime>
            <NoteText>Test Note 1 (999)</NoteText>
        </Note>
     </Notes>
</xmldocument>

And an example with 3 notes:

<xmldocument>
    <Notes>999, 10:06> Test Note 1 123, 10:08> Test Note 2 456, 10:10> Test Note 3</Notes>
</xmldocument>

Would yield:

<xmldocument>
    <Notes>
        <Note>
            <NoteTime>10:06</NoteTime>
            <NoteText>Test Note 1 (999)</NoteText>
        </Note>
        <Note>
            <NoteTime>10:08</NoteTime>
            <NoteText>Test Note 2 (123)</NoteText>
        </Note>
        <Note>
            <NoteTime>10:10</NoteTime>
            <NoteText>Test Note 2 (456)</NoteText>
        </Note>
     </Notes>
</xmldocument>

I guess I could do it with something like this but it seems to me that I shouldn't have to add that level complexity to do this.

Jeff G
  • 89
  • 1
  • 11
  • A single example is not enough to determine the logic that needs to be applied here. Please explain exactly how the `Notes` string is structured. Also tell us which XSLT 1.0 processor you will be using (some processors support the EXSLT `str:tokenize()` extension function, which looks like could be handy here). – michael.hor257k May 05 '19 at 17:17
  • @michael.hor257k, thank you for the help in formulating the question. – Jeff G May 05 '19 at 17:30
  • I am afraid it's still rather vague. I have posted an answer based on a guess. If my guess is incorrect, please add **detailed instructions** how to process the input (how would you do it manually?). – michael.hor257k May 05 '19 at 17:53
  • "Microsoft's XSLT processor"? So which one exactly (XslCompiledTransform, XslTransform, a version of MSXML (which one))? And are you able to use extension objects or scripts? – Martin Honnen May 05 '19 at 19:02

1 Answers1

1

--- Edited after clarification of the input structure ---

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="Notes">
    <xsl:copy>
        <xsl:call-template name="split-notes">
            <xsl:with-param name="text" select="."/>
        </xsl:call-template>
    </xsl:copy>
</xsl:template>

<xsl:template name="split-notes">
    <xsl:param name="text"/>
    <xsl:variable name="num" select="substring-before($text, ', ')" />
    <xsl:variable name="rest" select="substring-after($text, ', ')" />
    <xsl:variable name="more" select="contains($rest, ', ')" />
    <xsl:variable name="note">
        <xsl:choose>
            <xsl:when test="$more">
                <xsl:call-template name="find-end">
                    <xsl:with-param name="text" select="substring-before($rest, ', ')"/>
                </xsl:call-template>        
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$rest"/>    
            </xsl:otherwise>
        </xsl:choose>
    </xsl:variable>
    <Note>
        <NoteTime>
            <xsl:value-of select="substring-before($note, '> ')"/>
        </NoteTime>
        <NoteText>
            <xsl:value-of select="substring-after($note, '> ')"/>  
            <xsl:text> (</xsl:text>
            <xsl:value-of select="$num"/>    
            <xsl:text>)</xsl:text>                                  
        </NoteText>
    </Note>
    <xsl:if test="$more">
        <!-- recursive call -->
        <xsl:call-template name="split-notes">
            <xsl:with-param name="text" select="substring-after($text, $note)"/>
        </xsl:call-template>
    </xsl:if>
</xsl:template>

<xsl:template name="find-end">
    <xsl:param name="text"/>
    <xsl:variable name="last-char" select="substring($text, string-length($text))"/>
    <xsl:choose>
        <xsl:when test="translate($last-char, '123456789', '000000000') = '0'">
            <!-- recursive call -->
            <xsl:call-template name="find-end">
                <xsl:with-param name="text" select="substring($text, 1, string-length($text) - 1)"/>
            </xsl:call-template>            
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$text"/>    
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

</xsl:stylesheet>

When this is applied to the following input:

XML

<xmldocument>
    <Notes>999, 10:06> Test Note 1 123, 10:08> Test Note 2 456, 10:10> Test Note 3</Notes>
</xmldocument>

the result will be:

Result

<?xml version="1.0" encoding="UTF-8"?>
<xmldocument>
   <Notes>
      <Note>
         <NoteTime>10:06</NoteTime>
         <NoteText>Test Note 1  (999)</NoteText>
      </Note>
      <Note>
         <NoteTime>10:08</NoteTime>
         <NoteText>Test Note 2  (123)</NoteText>
      </Note>
      <Note>
         <NoteTime>10:10</NoteTime>
         <NoteText>Test Note 3 (456)</NoteText>
      </Note>
   </Notes>
</xmldocument>

Demo: https://xsltfiddle.liberty-development.net/ej9EGcB

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • I was getting wrapped up on the need for iteration, I think this is a really good start for me. I'm going to give this a go and see what I can do. The only issue with what you've got is that you've picked up the extra 422 from the second note in the first one. I suspect that's an artifact of trying to find the start of the second note. – Jeff G May 06 '19 at 11:46
  • No, it's an artifact of not understanding what is given and what is just an example. I still don't. If the number before comma can have any number of digits, then this can get pretty complex. If you have a way to use regex via an extension mechanism, I would advise you to take it. – michael.hor257k May 06 '19 at 14:15
  • Actually, I think you nailed it! Thank you for the help! – Jeff G May 07 '19 at 15:33