1

I have two XML files (A and B) that I want to append to form XML file C. Basically A is just a "header" and B is the "main" content.

A.xml:

<?xml version="1.0" encoding="utf-8" ?>
<!--
      SAS XML Libname Engine (SAS92XML)
      SAS XMLMap Generated Output
      Version 9.04.01M3P06242015
      Created 2021-02-18T16:52:07
  -->

<ns2:message xmlns:ns2="message">
<ns2:header xmlns:ns2="message">
<ns2:ID xmlns:ns2="message">11111</ns2:ID>
<ns2:survey xmlns:ns2="message">AABB</ns2:survey>
<ns2:partner xmlns:ns2="message">ABC</ns2:partner>
<ns2:initialDate xmlns:ns2="message">2020-01-01T00:00:00.000+00:00</ns2:initialDate>
<ns2:timeProduction xmlns:ns2="message">2021-02-18T16:41:35</ns2:timeProduction>
<ns2:type xmlns:ns2="message">TYPEOFMESSAGE</ns2:type>
</ns2:header>
</ns2:message>

B.xml:

<?xml version="1.0" encoding="UTF-8"?>
<ns2:message xmlns:ns2="message"
             xmlns:ns3="send">
   <ns2:content>
      <ns2:dataSegment id="OBSERVATION">
         <ns2:cube id="ABCD">
            <ns3:obs>
               <ns3:dim name="ID" value="1"/>
               <ns3:dim name="FROM" value="2021-02-17"/>
               <ns3:dim name="TO" value="2021-02-19"/>
               <ns3:dim name="VALUE" value="A"/>
            </ns3:obs>
         </ns2:cube>
      </ns2:dataSegment>
   </ns2:content>
</ns2:message>

C.xml (want):

<?xml version="1.0" encoding="UTF-8"?>
<ns2:message xmlns:ns2="message"
             xmlns:ns3="send">
    <ns2:header>
        <ns2:ID>11111</ns2:ID>
        <ns2:survey>AABB</ns2:survey>
        <ns2:partner>ABC</ns2:partner>
        <ns2:initialDate>2020-01-01T00:00:00.000+00:00</ns2:initialDate>
        <ns2:timeProduction>2021-02-18T16:41:35</ns2:timeProduction>
        <ns2:type>TYPEOFMESSAGE</ns2:type>
   </ns2:header>
   <ns2:content>
      <ns2:dataSegment id="OBSERVATION">
         <ns2:cube id="ABCD">
            <ns3:obs>
               <ns3:dim name="ID" value="1"/>
               <ns3:dim name="FROM" value="2021-02-17"/>
               <ns3:dim name="TO" value="2021-02-19"/>
               <ns3:dim name="VALUE" value="A"/>
            </ns3:obs>
         </ns2:cube>
      </ns2:dataSegment>
   </ns2:content>
</ns2:message>

For a long time, I have been using the PROC XSL to append A and B using the following .xsl script

script.xsl:

<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                             xmlns:ns2="message"> 
  <xsl:output indent="yes" encoding="UTF-8"/>
  <xsl:strip-space elements="*"/>
  
  <xsl:template match="/ns2:message">
    <ns2:message xmlns:ns2="message" xmlns:ns3="send"> 
         <!-- COPY CURRENT DATA -->
         <xsl:copy-of select="*"/>

         <!-- COMBINE ALL DATA FROM file.xml -->
         <xsl:copy-of select="document('file:/path/to/file.xml')/ns2:message/*" />
    </ns2:message>
  </xsl:template> 
  
</xsl:transform>

However, I found out that when B is too large (~60MB), the PROC XSL does not create C (it does the job perfectly when B is not that large).

SAS Code:

proc xsl 
   in  = 'path/to/file/A.xml'
   xsl = 'path/to/file/script.xsl'
   out = 'path/to/file/final.xml';
run;

No errors/warnings in the log.

SAS Log:

MPRINT(GENERATE_XML):           proc xsl 
   in  = 'path/to/file/A.xml'
   xsl = 'path/to/file/script.xsl'
   out = 'path/to/file/final.xml';
MPRINT(GENERATE_XML):   run;

NOTE: PROCEDURE XSL used (Total process time):
      real time           19.61 seconds
      cpu time            0.00 seconds

As it is such a small insert, literally appending 8 lines, I was wondering if it was just not possible to just read B.xml through a data _null_ step and insert (using a put statement for example) those 8 lines at the top of the xml file?

Kermit
  • 3,112
  • 2
  • 10
  • 34
  • @Parfait Post edited. The PROC XSL ran with no errors/warnings. Just like it does when B is ~10MB and works fine. – Kermit Feb 18 '21 at 17:01
  • @Parfait Already did. Paths are correctly defined. Basically final.xml is just not created when B is too large. I have tested several times with different B sizes. – Kermit Feb 18 '21 at 17:08
  • @Parfait Just retested again. When B is 2MB, final.xml is created when B is 62MB, file.xml is not created. – Kermit Feb 18 '21 at 17:17
  • No, when B is 2MB, final.xml is created and is indeed the good concatenation of A and B (as intended). However when B is 62MB, the final.xml is not created at all! I really don't think that the problem is the CPU specs. – Kermit Feb 18 '21 at 17:40

2 Answers2

1

If all the xml is not in a single line, you can use a data _null_; step to read and stack the two files in a containing tag.

Example:

Textual processing only. No checks for any sort of validity. You will have to specify LRECL= in your INFILE and FILE if you have text lines longer than default (256)

filename xml_a temp;
filename xml_b temp;
filename xml_c 'c:\temp\c_wanted.xml';

* create xml a;
data _null_;
  file xml_a;
  input; put _infile_;
  datalines4;
<?xml version="1.0" encoding="utf-8" ?>
<!--
      SAS XML Libname Engine (SAS92XML)
      SAS XMLMap Generated Output
      Version 9.04.01M3P06242015
      Created 2021-02-18T16:52:07
  -->

<ns2:message xmlns:ns2="message">
<ns2:header xmlns:ns2="message">
<ns2:ID xmlns:ns2="message">11111</ns2:ID>
<ns2:survey xmlns:ns2="message">AABB</ns2:survey>
<ns2:partner xmlns:ns2="message">ABC</ns2:partner>
<ns2:initialDate xmlns:ns2="message">2020-01-01T00:00:00.000+00:00</ns2:initialDate>
<ns2:timeProduction xmlns:ns2="message">2021-02-18T16:41:35</ns2:timeProduction>
<ns2:type xmlns:ns2="message">TYPEOFMESSAGE</ns2:type>
</ns2:header>
</ns2:message>
;;;;

* create xml b;
data _null_;
  file xml_b;
  input; put _infile_;
  datalines4;
<?xml version="1.0" encoding="UTF-8"?>
<ns2:message xmlns:ns2="message"
             xmlns:ns3="send">
   <ns2:content>
      <ns2:dataSegment id="OBSERVATION">
         <ns2:cube id="ABCD">
            <ns3:obs>
               <ns3:dim name="ID" value="1"/>
               <ns3:dim name="FROM" value="2021-02-17"/>
               <ns3:dim name="TO" value="2021-02-19"/>
               <ns3:dim name="VALUE" value="A"/>
            </ns3:obs>
         </ns2:cube>
      </ns2:dataSegment>
   </ns2:content>
</ns2:message>
;;;;


* stack a and b within message send;

data _null_;
  file xml_c;
  put 
    '<?xml version="1.0" encoding="UTF-8"?>'
  / '<ns2:message xmlns:ns2="message"'
  / '             xmlns:ns3="send">'
  ;

  put /'<!-- file a -->'/;

  flag = 0;
  do while (not eof_a);
    infile xml_a end=eof_a;
    input;

    if not flag and strip(_infile_)=:'<ns2:header xmlns:ns2="message">' then flag=1;

    if flag then put _infile_;

    if flag and strip(_infile_)=:'</ns2:header>' then flag = 0;
  end;

  put /'<!-- file b -->'/;

  flag = 0;
  do while (not eof_b);
    infile xml_b end=eof_b;
    input; 

    if not flag and strip(_infile_)=:'<ns2:content>' then flag=1;

    if flag then put _infile_;

    if flag and strip(_infile_)=:'</ns2:content>' then flag = 0;
  end;

  put 
    '</ns2:message>'
  ;

  stop;
run;
Richard
  • 25,390
  • 3
  • 25
  • 38
  • 1
    [What's so bad about building XML with string concatenation?](https://stackoverflow.com/q/3034611/1422451) I always advise not handling XML as vanilla text objects. OP may be hitting a bug that should be explored. – Parfait Feb 18 '21 at 17:35
1

Actually, I do reproduce the issue on SAS 9.4 for Windows (64-bit/64 GB RAM) but yield below error:

ERROR: java.lang.OutOfMemoryError: GC overhead limit exceeded
ERROR: java.io.IOException: Pipe closed

XSLT is known to be memory-intensive requiring all document(s) to be held in memory plus operations on that tree. One would need roughly 5X the memory as text size. SAS in proc xsl may internally cap memory usage for large files.

Fortunately, XSLT is an industry language that does not require SAS to run. If using SAS on Windows consider interfacing with the built-in XSLT processor, System.Xml.Xsl via a PowerShell script that you can call at command line or with SAS's X command.

Also, try reversing your operations in SAS and XSLT by redesigning the larger, B.xml, with an append of A.xml on top..

XSLT (save as .xsl to be called in PowerShell)

<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                             xmlns:ns2="message"> 
  <xsl:output method="xml" indent="yes" encoding="UTF-8"/>
  <xsl:strip-space elements="*"/>
  
  <xsl:template match="/ns2:message">
    <ns2:message xmlns:ns2="message" xmlns:ns3="send"> 
         <!-- COMBINE ALL DATA FROM A.xml -->
         <xsl:copy-of select="document('file:/path/to/A.xml')/ns2:message/*" />

         <!-- COPY CURRENT DATA -->
         <xsl:copy-of select="*"/>
    </ns2:message>
  </xsl:template> 
  
</xsl:transform>

PowerShell (save as .ps1 file to be called by SAS)

$xslt = New-Object System.Xml.Xsl.XslCompiledTransform;
$settings = New-Object System.Xml.Xsl.XsltSettings($true, $false);
$resolver = New-Object System.Xml.XmlUrlResolver;

$xslt.Load("path/to/file/script.xsl", $settings, $resolver);

$xslt.Transform("path/to/file/B.xml", 
                "path/to/file/final.xml");

(Above ran fairly quickly for me with a 300 MB file!)

SAS (yes, that single line where PowerShell window will launch)

X 'powershell -executionPolicy bypass -noexit -file "/path/to/powershell/script.ps1"';
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thank you, it is the second time you already helped me with `.xsl` files ! I was trying to find out a way to increase the potential memory issue using the SAS Prodcedure but could not find a way to do it! – Kermit Feb 19 '21 at 07:57
  • Glad to help and interesting to know about `proc xsl` (unfortunate since it supports XSLT 2.0). I wonder if you can post your issue on https://communities.sas.com/ that their developers may have workarounds. – Parfait Feb 19 '21 at 20:08
  • @Parfait, I have used this method, and the X command took a lot of time, and the file didn't respond. The xml files only has 30KB, could you pls clarify more? Thanks! – Elif Y Jan 15 '22 at 02:16
  • @ElifY, all depends on environment and how you call the `X` command. Please ask a new question with such details. Why not `procs xsl`? Good luck and happy coding! – Parfait Jan 15 '22 at 04:21
  • @Parfait, thanks but seems that PROC XSL can only do Transforming an XML Document into Another XML Document, not combining two XML files into one file. Could you pls give me details? – Elif Y Jan 17 '22 at 04:29
  • Look into XSLT's `document()` function. In fact, SAS's `proc xsl` uses XSLT 2.0 (not 1.0) with more features to pull from other documents. – Parfait Jan 17 '22 at 16:03