I am on AIX, with bash, and we cannot install additional software at this time so I am very limited to command line batch processing and maybe custom java scripts. So, I have a ton of XML files in different directories. Here is what a subset may look like.
root_dir
Pages
PAGES_1.XML
Queries
QUERIES_1.XML
QUERIES_2.XML
QUERIES_3.XML
I have put together a script that gets me almost everything I want, but I don't know how to do the last piece of the puzzle if possible in a batch script. I create a new directory under root, copy all of the XML files into the new directory, and then I rename them to remove any spaces if there are any in the name, and buffer the integer so they can be sorted in alphabetical / numerical order. The new output looks like this:
copy_dir
PAGES_001.XML
QUERIES_001.XML
QUERIES_002.XML
QUERIES_003.XML
I am almost there. The last piece is that these separate XML files need to be combined into one XML file for each type, so HISTORY_001.XML to HISTORY_099.XML need to be combined, then QUERIES_001.XML to QUERIES_099.XML need to be combined, but only after a specific point in the file. I have a regex for the files that will select the parts that I want, now I just need to figure out how to loop through each file subset. Maybe I jumped the gun and should do it before moving them, but assuming they are all in one directory, how can I go about this?
Here is an example of the data. All of the XML files carry these same types of information.
Pages
<?xml version="1.0"?>
<project name="">
<rundate></rundate>
<object_type code="false" firstitem="1" id="5" items="65" name="Pages">
<primary_key>Page Name</primary_key>
<secondary_key>Language Code</secondary_key>
<secondary_key>Page Field ID</secondary_key>
<secondary_key>Field Type</secondary_key>
<secondary_key>Record (Table) Name</secondary_key>
<secondary_key>Field Name</secondary_key>
<item id="ACCTG_TEMPLATE_AP">
...
</item>
<item id="ACCTG_TEMPLATE_AR">
...
</item>
</object_type>
</project>
Queries
<?xml version="1.0"?>
<project name="">
<rundate></rundate>
<object_type code="false" firstitem="1" id="10" items="46" name="Queries">
<primary_key>Query Name</primary_key>
<primary_key>User ID</primary_key>
<item id="1099G_ALL_SHORT. ">
...
</item>
<item id="1099G_ALL_VOUCHERS. ">
...
</item>
</object_type>
</project>
Regex to pull out header
(?:(?!(^\s*i<item)).)*
Regex to pull out detail
^(\s*<item id=).*(</item>)
Regex to pull out footer
^(\s*</object_type).*
So I am assuming that what I want to do it have a counter, loop through each object type XML subset, if I am the first loop then pull the header and detail and output to a new summary file, then continue for all other files to concat the detail, then if the last file or change to a new object type then output the footer as well. Do you think this is possible using bash script?