0

Goal: Laboratory is using outdated analytical instrument and they would like to have the export file with analytical results be automatically uploaded into our LIMS system. We are starting with a messy XML file that needs to be properly formatted using an XSLT.

Starting XML File:

<dataRoot>
  <dataRow>
    <a0>07642110191051151NJL B2019     PURE_ZIN  MLT201910210021               10-21-2019 0840 HRS                                         163111                        029Sn      00000099.5Pb      00000.0163Cu      00000.0111As      00000.0023Bi     &lt;00000.0000Zn      00000.0007Fe      00000.0062Ag      00000.4490Sb      00000.0115Ni      00000.0008Cd      00000.0001S       00000.0007Al      00000.0002Au     &lt;00000.0000P      &lt;00000.0001In      00000.0058Co     &lt;00000.0000Tl      00000.0001Be      0000000000Ce      0000000000Ga      0000000000Ge     &lt;00000.0001Hg      00000.0010Mg      0000000000Pd      0000000000Pt      0000000000Se      0000000000Te      0000000000Bg      00000099.502Customer                                SpecificatBDE                           </a0>
  </dataRow>
  <dataRow>
    <a0>07642110191055181NJL B2019     PURE_ZIN  MLT201910210022               10-21-2019 0920 HRS                                         163111                        029Sn      00000099.5Pb      00000.0197Cu      00000.0104As      00000.0020Bi     &lt;00000.0000Zn      00000.0008Fe      00000.0057Ag      00000.4274Sb      00000.0114Ni      00000.0008Cd      00000.0001S       00000.0007Al      00000.0001Au     &lt;00000.0000P      &lt;00000.0001In      00000.0062Co     &lt;00000.0000Tl      00000.0002Be      0000000000Ce      0000000000Ga      0000000000Ge     &lt;00000.0001Hg      00000.0009Mg      0000000000Pd      0000000000Pt      0000000000Se      0000000000Te      0000000000Bg      00000099.502Customer                                SpecificatBDE                           </a0>
  </dataRow>
</dataRoot>​

Note that latest analysis will be appended to the same file, thus we are only interested in taking the last row for our data import.

What I have so far for my XSL file to translate the XML file into format I can input into our SQL DB

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0"/>

<xsl:template match="/">
  <!-- Set opening tag to INBOUND -->
  <INBOUND>
        <!-- only process the last row that contains Sample Data (ie: a0 starts with '0'), as the output continually appends new record to existing data -->
        <xsl:for-each select="dataRoot/dataRow[a0][last()]">
            <!-- Get static SAMPLE values that will be the same for all Parameters on a given row -->
            <xsl:variable name="sSampleId" select="normalize-space(substring(a0,41,16))" />
            <xsl:variable name="sText1" select="normalize-space(substring(a0,129,20))" />
            <xsl:variable name="sText6" select="normalize-space(substring(a0,31,10))" />
            <xsl:variable name="sText8" select="normalize-space(substring(a0,71,40))" />
            <xsl:variable name="sSubmitter">OES Import</xsl:variable>           
            <xsl:variable name="dEnteredOn" select="substring(current-date(),0,11)" />
            <xsl:variable name="sEnteredBy">OES Import</xsl:variable>   
            <xsl:variable name="sOwner">MLT</xsl:variable>
            <xsl:variable name="sStringLength" select="string-length(a0)" />
            <xsl:variable name="sDataString" select="substring-before(substring(a0,165,$sStringLength), '02Cust')" />
            <xsl:variable name="sDataStringB" select="translate($sDataString, ' &lt;', '  ')" />
                    <INBOX_SAMPLE>
                        <EVENT>1</EVENT>
                        <SAMPLE_ID>'<xsl:value-of select="$sSampleId" />'</SAMPLE_ID>
                        <TEXT1>'<xsl:value-of select="$sText1" />'</TEXT1>
                        <TEXT6>'<xsl:value-of select="$sText6" />'</TEXT6>
                        <TEXT8>'<xsl:value-of select="$sText8" />'</TEXT8>
                        <SUBMITTER>'<xsl:value-of select="$sSubmitter" />'</SUBMITTER>
                        <ENTERED_ON>'<xsl:value-of select="$dEnteredOn"/>'</ENTERED_ON>
                        <ENTERED_BY>'<xsl:value-of select="$sEnteredBy" />'</ENTERED_BY>    
                        <OWNER>'<xsl:value-of select="$sOwner" />'</OWNER>
                        <STRING_LENGTH>'<xsl:value-of select="$sStringLength" />'</STRING_LENGTH>                                           <DATA_STRING>'<xsl:value-of select="$sDataStringB" />'</DATA_STRING>                
                    </INBOX_SAMPLE>
        </xsl:for-each>
            <!-- get the next record row until there are no more in the file-->
    <!-- closing tag matches opening INBOUND tag -->
    </INBOUND>
</xsl:template>

</xsl:stylesheet>

What I currently have for my output:

<INBOUND>
   <INBOX_SAMPLE>
      <EVENT>1</EVENT>
      <SAMPLE_ID>'MLT201910210022'</SAMPLE_ID>
      <TEXT1>'163111'</TEXT1>
      <TEXT6>'PURE_ZIN'</TEXT6>
      <TEXT8>'10-21-2019 0920 HRS'</TEXT8>
      <SUBMITTER>'OES Import'</SUBMITTER>
      <ENTERED_ON>'2019-10-22'</ENTERED_ON>
      <ENTERED_BY>'OES Import'</ENTERED_BY>
      <OWNER>'MLT'</OWNER>
      <STRING_LENGTH>'768'</STRING_LENGTH>
      <DATA_STRING>'Sn      00000099.5Pb      00000.0197Cu      00000.0104As      00000.0020Bi      00000.0000Zn      00000.0008Fe      00000.0057Ag      00000.4274Sb      00000.0114Ni      00000.0008Cd      00000.0001S       00000.0007Al      00000.0001Au      00000.0000P       00000.0001In      00000.0062Co      00000.0000Tl      00000.0002Be      0000000000Ce      0000000000Ga      0000000000Ge      00000.0001Hg      00000.0009Mg      0000000000Pd      0000000000Pt      0000000000Se      0000000000Te      0000000000Bg      00000099.5'</DATA_STRING>
   </INBOX_SAMPLE>

What I need is that instead of that long string "DATA_STRING" every 18 characters (after the initial ') would be broken up.... this is an example I made manually of what the end result should look like:

<INBOUND>
   <INBOX_SAMPLE>
      <EVENT>1</EVENT>
      <SAMPLE_ID>'MLT201910210022'</SAMPLE_ID>
      <TEXT1>'163111'</TEXT1>
      <TEXT6>'PURE_ZIN'</TEXT6>
      <TEXT8>'10-21-2019 0920 HRS'</TEXT8>
      <SUBMITTER>'OES Import'</SUBMITTER>
      <ENTERED_ON>'2019-10-22'</ENTERED_ON>
      <ENTERED_BY>'OES Import'</ENTERED_BY>
      <OWNER>'MLT'</OWNER>
      <STRING_LENGTH>'768'</STRING_LENGTH>
      <DATA_STRING>'Sn      00000099.5'</DATA_STRING>
    </INBOX_SAMPLE>
    <INBOX_SAMPLE>
      <EVENT>1</EVENT>
      <SAMPLE_ID>'MLT201910210022'</SAMPLE_ID>
      <TEXT1>'163111'</TEXT1>
      <TEXT6>'PURE_ZIN'</TEXT6>
      <TEXT8>'10-21-2019 0920 HRS'</TEXT8>
      <SUBMITTER>'OES Import'</SUBMITTER>
      <ENTERED_ON>'2019-10-22'</ENTERED_ON>
      <ENTERED_BY>'OES Import'</ENTERED_BY>
      <OWNER>'MLT'</OWNER>
      <STRING_LENGTH>'768'</STRING_LENGTH>
      <DATA_STRING>'Pb      00000.0197'</DATA_STRING>
    </INBOX_SAMPLE>
    <INBOX_SAMPLE>
      <EVENT>1</EVENT>
      <SAMPLE_ID>'MLT201910210022'</SAMPLE_ID>
      <TEXT1>'163111'</TEXT1>
      <TEXT6>'PURE_ZIN'</TEXT6>
      <TEXT8>'10-21-2019 0920 HRS'</TEXT8>
      <SUBMITTER>'OES Import'</SUBMITTER>
      <ENTERED_ON>'2019-10-22'</ENTERED_ON>
      <ENTERED_BY>'OES Import'</ENTERED_BY>
      <OWNER>'MLT'</OWNER>
      <STRING_LENGTH>'768'</STRING_LENGTH>
      <DATA_STRING>'Cu      00000.0104'</DATA_STRING>
    </INBOX_SAMPLE>
    etc.....
</INBOUND>

The long data string needs to first be broken up into shorter strings of 18 character length and then a for-each loop should be done for every one of those shorter substrings. I'm so lost! Trying very hard and could use your help!

Lena Kup
  • 5
  • 2
  • Which processor are you using? Your stylesheet declares `version="1.0" ` but `current-date()` requires XSLT 2.0. This could make a big difference in deciding how to split the string. – michael.hor257k Oct 22 '19 at 20:43
  • Also, I don't see where you get `Sn 00000099.5` from. The first 18 characters in the input are `07642110191051151N`. – michael.hor257k Oct 22 '19 at 20:48
  • See: https://stackoverflow.com/questions/34068953/xslt-split-string-on-every-nth-character-in-loop for methods of splitting a string by length. Also https://stackoverflow.com/questions/41742209/splitting-of-strings-based-on-the-required-length (XSLT 2.0 only). – michael.hor257k Oct 22 '19 at 20:59
  • I'm using version 1.0. So far the "current-date()" has worked as shown on my XSLT even with version 1.0. If it doesn't work I will just program that field on the DB level with current date. and stick with Version 1.0. – Lena Kup Oct 23 '19 at 13:15
  • The rule for every 18 characters is specifically for the output of the variable "sDataStringB" The value of that variable after all the formatting is done to it becomes: – Lena Kup Oct 23 '19 at 13:17
  • 'Sn 00000099.5Pb 00000.0197Cu 00000.0104As 00000.0020Bi 00000.0000Zn 00000.0008Fe 00000.0057Ag 00000.4274Sb 00000.0114Ni 00000.0008Cd 00000.0001S 00000.0007Al 00000.0001Au 00000.0000P 00000.0001In 00000.0062Co 00000.0000Tl 00000.0002Be 0000000000Ce 0000000000Ga 0000000000Ge 00000.0001Hg 00000.0009Mg 0000000000Pd 0000000000Pt 0000000000Se 0000000000Te 0000000000Bg 00000099.502' – Lena Kup Oct 23 '19 at 13:18
  • So this is the long string that I need to break up into 18 character long strings and look through each of the substrings. – Lena Kup Oct 23 '19 at 13:19
  • I have reviewed the link you provided: https://stackoverflow.com/questions/34068953/xslt-split-string-on-every-nth-character-in-loop However I don't know how to utilize it within my current XSLT file. Where do I call it out? how do I make sure to add everything else to it? – Lena Kup Oct 23 '19 at 13:23

1 Answers1

0

Consider the following simplified example:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="/dataRoot">
    <xsl:variable name="a0" select="dataRow[last()]/a0" />
    <xsl:variable name="common-nodes">
        <EVENT>1</EVENT>
        <SAMPLE_ID>
            <xsl:value-of select="normalize-space(substring($a0, 41, 16))"/>
        </SAMPLE_ID>
        <TEXT1>
            <xsl:value-of select="normalize-space(substring($a0, 129, 20))"/>
        </TEXT1>
        <!-- add more here ... -->
    </xsl:variable>
    <xsl:variable name="substring" select="substring-before(substring($a0, 165), '02Cust')"/>
    <xsl:variable name="data-string" select="translate($substring, '&lt;', ' ')"/>
    <INBOUND>
        <xsl:call-template name="create-output">
            <xsl:with-param name="string" select="$data-string"/>
            <xsl:with-param name="nodes" select="$common-nodes"/>
        </xsl:call-template>
    </INBOUND>      
</xsl:template>

<xsl:template name="create-output">
    <xsl:param name="string"/>
    <xsl:param name="nodes"/>
    <xsl:param name="len" select="18"/>
    <!-- create node -->
    <INBOX_SAMPLE>
        <xsl:copy-of select="$nodes"/>
        <DATA_STRING>
            <xsl:value-of select="substring($string, 1, $len)"/>
        </DATA_STRING>
    </INBOX_SAMPLE>
    <!-- recursive call -->
    <xsl:if test="string-length($string) > $len">
        <xsl:call-template name="create-output">
            <xsl:with-param name="string" select="substring($string, $len+1)"/>
            <xsl:with-param name="nodes" select="$nodes"/>
        </xsl:call-template>
    </xsl:if>
</xsl:template>

</xsl:stylesheet>

Applied to your example input, this will produce:

Result

<?xml version="1.0" encoding="UTF-8"?>
<INBOUND>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Sn      00000099.5</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Pb      00000.0197</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Cu      00000.0104</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>As      00000.0020</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Bi      00000.0000</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Zn      00000.0008</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Fe      00000.0057</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Ag      00000.4274</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Sb      00000.0114</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Ni      00000.0008</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Cd      00000.0001</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>S       00000.0007</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Al      00000.0001</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Au      00000.0000</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>P       00000.0001</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>In      00000.0062</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Co      00000.0000</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Tl      00000.0002</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Be      0000000000</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Ce      0000000000</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Ga      0000000000</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Ge      00000.0001</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Hg      00000.0009</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Mg      0000000000</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Pd      0000000000</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Pt      0000000000</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Se      0000000000</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Te      0000000000</DATA_STRING>
  </INBOX_SAMPLE>
  <INBOX_SAMPLE>
    <EVENT>1</EVENT>
    <SAMPLE_ID>MLT201910210022</SAMPLE_ID>
    <TEXT1>163111</TEXT1>
    <DATA_STRING>Bg      00000099.5</DATA_STRING>
  </INBOX_SAMPLE>
</INBOUND>

P.S. If this is intended to provide an input to a database, then you should only have to add a unique ID to the individual substrings. Everything else should be imported once into a parent table, using a separate process.

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • I got the same error initially. When I copy/pasted your input XML, I found it had a trailing ZERO WIDTH SPACE character. – michael.hor257k Oct 23 '19 at 19:08
  • Thank you so much this solution worked for me perfectly! I was so wrapped up with the template I've seen somewhere else, thus it was perfect for you to provide a simpler solution. – Lena Kup Oct 23 '19 at 20:58
  • And you saved me after hours and hours of work :) Really appreciate your time and advice! – Lena Kup Oct 23 '19 at 21:00