How can I concatenate multiple XML files from different directories into a single XML file using Perl?
-
1You seem to have forgotten to ask a question. – ikegami Sep 10 '14 at 14:41
-
2That depends on the XML files. Please show the data you have – Borodin Sep 10 '14 at 15:15
-
Here is one XML file and all other XML files are same type – Seshagiri Lekkala Sep 10 '14 at 17:10
-
mloam-service-layer 32 1 10 0 0 6.2.1.2.43 2097162 ippm_fabl_idx_rdb_data_t 65568 -
@SeshagiriLekkala That is too hard to read in a comment, edit it into your question. Also, show the content of *two* separate XML files and *exactly* what you would like the combined file to look like. If your files are long, create short examples to use with this question and show those instead. Nobody wants to scroll through pages of random XML. – ThisSuitIsBlackNot Sep 10 '14 at 17:38
-
Can you please find updated example for reference – Seshagiri Lekkala Sep 11 '14 at 00:39
-
What have you tried? If you have a specific question about something you've tried to do, ask it. As it is, it appears you haven't tried to do anything. – Rob K Sep 11 '14 at 17:50
-
Have you tried using an xml parser? http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Degustaf Sep 11 '14 at 18:24
-
I am very new to perl programming and started learning – Seshagiri Lekkala Sep 11 '14 at 21:00
-
Can you please suggest me the approach, if you have any idea – Seshagiri Lekkala Sep 12 '14 at 00:25
1 Answers
I've had to make quite a lot of assumptions to do this, but here's my answer:
#!/usr/bin/perl -w
use strict;
use XML::LibXML;
my $output_doc = XML::LibXML->load_xml( string => <<EOF);
<?xml version="1.0" ?>
<issu-meta xmlns="ver2">
<metadescription>
<num-objects xml:id='total'/>
</metadescription>
<compatibility>
<baseline> 6.2.1.2.43 </baseline>
</compatibility>
</issu-meta>
EOF
my $object_count = 0;
foreach (@ARGV) {
my $input_doc = XML::LibXML->load_xml( location => $_ );
foreach ($input_doc->findnodes('/*[local-name()="issu-meta"]/*[local-name()="basictype"]')) { # find each object
my $object = $output_doc->importNode($_, 1); # import the object information into the output document
$output_doc->documentElement->appendChild($object); # append the new XML nodes to the output document root
$object_count++; # keep track of how many objects we've seen
}
}
my $total = $output_doc->getElementById('total'); # find the element which will contain the object count
$total->appendChild($output_doc->createTextNode($object_count)); # append the object count to that element
$total->removeAttribute('xml:id'); # remove the XML id, as it's not wanted in the output
print $output_doc->toString; # output the final document
Firstly, the <comp>
element seems to come from nowhere, so I've had to ignore that. I'm also assuming that the required output content before each of the <basictype>
elements is always going to be the same, except for the object count.
So I build an empty output document to start with, and then iterate over each filename provided on the commandline. For each, I find each object and copy it into the output file. Once I've done all the input files, I insert the object count.
It's made more difficult by the use of xmlns
on the files. This makes the XPath search expression more complicated than it needs to be. If possible, I'd be tempted to remove the xmlns
attributes and you'd be left with:
foreach ($input_doc->findnodes('/issu-meta/basictype')) {
which is a lot simpler.
So, when I run this:
perl combine abc/a.xml xyz/b.xml
I get:
<?xml version="1.0"?>
<issu-meta xmlns="ver2">
<metadescription>
<num-objects>3</num-objects>
</metadescription>
<compatibility>
<baseline> 6.2.1.2.43 </baseline>
</compatibility>
<basictype>
<id> 1 </id>
<name> pointer </name>
<pointer/>
<size> 64 </size>
</basictype><basictype>
<id> 4 </id>
<name> int32_t </name>
<primitive/>
<size> 32 </size>
</basictype><basictype>
<id> 2 </id>
<name> int8_t </name>
<primitive/>
<size> 8 </size>
</basictype></issu-meta>
which is pretty close to what you're after.
Edit: OK, my answer now looks like this:
#!/usr/bin/perl -w
use strict;
use XML::LibXML qw( :libxml ); # load LibXML support and include node type definitions
my $output_doc = XML::LibXML->load_xml( string => <<EOF); # create an empty output document
<?xml version="1.0" ?>
<issu-meta xmlns="ver2">
<metadescription>
<num-objects xml:id='total'/>
</metadescription>
<compatibility>
<baseline> 6.2.1.2.43 </baseline>
</compatibility>
</issu-meta>
EOF
my $object_count = 0;
foreach (@ARGV) {
my $input_doc = XML::LibXML->load_xml( location => $_ );
my $import_started = 0;
foreach ($input_doc->documentElement->childNodes) {
next unless $_->nodeType == XML_ELEMENT_NODE; # if it's not an element, ignore it
if ($_->localName eq 'compatibility') { # if it's the "compatibility" element, ...
$import_started = 1; # ... switch on importing ...
next; # ... and move to the next child of the root
}
next unless $import_started; # if we've not started importing, and it's
# not the "compatibility" element, simply
# ignore it and move on
my $object = $output_doc->importNode($_, 1); # import the object information into the output document
$output_doc->documentElement->appendChild($object); # append the new XML nodes to the output document root
$object_count++; # keep track of how many objects we've seen
}
}
my $total = $output_doc->getElementById('total'); # find the element which will contain the object count
$total->appendChild($output_doc->createTextNode($object_count)); # append the object count to that element
$total->removeAttribute('xml:id'); # remove the XML id, as it's not wanted in the output
print $output_doc->toString; # output the final document
which simply imports each element which is a child of the root <issu-meta>
document element after the first <compatibility>
element it finds, and, as before, updates the object count. If I've understood your requirement that should do you.
If it works, I strongly suggest you work through both this answer and my earlier one to ensure you understant why it works for your problem. There are lots of useful technologies used in here, and once you understand it, you will have learned a lot about some of the ways you can manipulate XML. Any problems, ask a new question on this site. Have fun!
Edit #2: Right, this should be the last piece you need:
#!/usr/bin/perl -w
use strict;
use XML::LibXML qw( :libxml ); # load LibXML support and include node type definitions
my @input_files = (
'abc/a.xml',
'xyz/b.xml',
);
my $output_file = 'output.xml';
my $output_doc = XML::LibXML->load_xml( string => <<EOF); # create an empty output document
<?xml version="1.0" ?>
<issu-meta xmlns="ver2">
<metadescription>
<num-objects xml:id='total'/>
</metadescription>
<compatibility>
<baseline> 6.2.1.2.43 </baseline>
</compatibility>
</issu-meta>
EOF
my $object_count = 0;
foreach (@input_files) {
my $input_doc = XML::LibXML->load_xml( location => $_ );
my $import_started = 0;
foreach ($input_doc->documentElement->childNodes) {
next unless $_->nodeType == XML_ELEMENT_NODE; # if it's not an element, ignore it
if ($_->localName eq 'compatibility') { # if it's the "compatibility" element, ...
$import_started = 1; # ... switch on importing ...
next; # ... and move to the next child of the root
}
next unless $import_started; # if we've not started importing, and it's
# not the "compatibility" element, simply
# ignore it and move on
my $object = $output_doc->importNode($_, 1); # import the object information into the output document
$output_doc->documentElement->appendChild($object); # append the new XML nodes to the output document root
$object_count++; # keep track of how many objects we've seen
}
}
my $total = $output_doc->getElementById('total'); # find the element which will contain the object count
$total->appendChild($output_doc->createTextNode($object_count)); # append the object count to that element
$total->removeAttribute('xml:id'); # remove the XML id, as it's not wanted in the output
$output_doc->toFile($output_file, 1); # output the final document
After running like this: perl combine
the file output.xml
is created, with the following contents:
<?xml version="1.0"?>
<issu-meta xmlns="ver2">
<metadescription>
<num-objects>7</num-objects>
</metadescription>
<compatibility>
<baseline> 6.2.1.2.43 </baseline>
</compatibility>
<basictype>
<id> 1 </id>
<name> pointer </name>
<pointer/>
<size> 64 </size>
</basictype><basictype>
<id> 4 </id>
<name> int32_t </name>
<primitive/>
<size> 32 </size>
</basictype><enum>
<id>1835009 </id>
<name> chkpt_state_t </name>
<label>
<name> CHKP_STATE_PENDING </name>
<value> 1 </value>
</label>
</enum><struct>
<id> 1835010 </id>
<name> _ipcEndpoint </name>
<size> 64 </size>
<elem>
<id> 0 </id>
<name> ep_addr </name>
<type> uint32_t </type>
<type-id> 8 </type-id>
<size> 32 </size>
<offset> 0 </offset>
</elem>
</struct><basictype>
<id> 2 </id>
<name> int8_t </name>
<primitive/>
<size> 8 </size>
</basictype><alias>
<id> 1835012 </id>
<name> Endpoint </name>
<size> 64 </size>
<type> _ipcEndpoint </type>
<type-id> 1835010 </type-id>
</alias><bitmask>
<id> 1835015 </id>
<name> ipc_flag_t </name>
<size> 8 </size>
<type> uint8_t </type>
<type-id> 6 </type-id>
<label>
<name> IPC_APPLICATION_REGISTER_MSG </name>
<value> 1 </value>
</label>
</bitmask></issu-meta>
Last tip: although it makes almost no difference to the XML, it's a little more human-readable once it's been run through xmltidy
:
<?xml version="1.0"?>
<issu-meta xmlns="ver2">
<metadescription>
<num-objects>7</num-objects>
</metadescription>
<compatibility>
<baseline> 6.2.1.2.43 </baseline>
</compatibility>
<basictype>
<id> 1 </id>
<name> pointer </name>
<pointer/>
<size> 64 </size>
</basictype>
<basictype>
<id> 4 </id>
<name> int32_t </name>
<primitive/>
<size> 32 </size>
</basictype>
<enum>
<id>1835009 </id>
<name> chkpt_state_t </name>
<label>
<name> CHKP_STATE_PENDING </name>
<value> 1 </value>
</label>
</enum>
<struct>
<id> 1835010 </id>
<name> _ipcEndpoint </name>
<size> 64 </size>
<elem>
<id> 0 </id>
<name> ep_addr </name>
<type> uint32_t </type>
<type-id> 8 </type-id>
<size> 32 </size>
<offset> 0 </offset>
</elem>
</struct>
<basictype>
<id> 2 </id>
<name> int8_t </name>
<primitive/>
<size> 8 </size>
</basictype>
<alias>
<id> 1835012 </id>
<name> Endpoint </name>
<size> 64 </size>
<type> _ipcEndpoint </type>
<type-id> 1835010 </type-id>
</alias>
<bitmask>
<id> 1835015 </id>
<name> ipc_flag_t </name>
<size> 8 </size>
<type> uint8_t </type>
<type-id> 6 </type-id>
<label>
<name> IPC_APPLICATION_REGISTER_MSG </name>
<value> 1 </value>
</label>
</bitmask>
</issu-meta>
Good luck working through this and taking it further. Do come back to this site to ask more questions when they come up!

- 9,171
- 33
- 51
-
Thank you very much Tim for your help. its very near to my requirement. – Seshagiri Lekkala Sep 12 '14 at 18:34
-
Apart from
tag, i have many different type of tags available in my actual data. Till the – Seshagiri Lekkala Sep 12 '14 at 18:46tag every thing is same for all XML files. So i should be copy all tags starting below tag to till end. can you please help me to do the same. -
I have updated input XML files and output XML. Can you please refer the same. – Seshagiri Lekkala Sep 12 '14 at 19:43
-
Thanks lot a for your time. Its working exactly as i expected. Can you please incorporate following with your solution 1) Update input file names into an array and read. Instead of from console. 2) Copy all integrated XML files content into new XML file instead of printing on console. – Seshagiri Lekkala Sep 12 '14 at 23:49
-
I wish to vote up for your answer, but unfortunately i do not have enough reputations to vote up. Minimum 15 reputations is required. – Seshagiri Lekkala Sep 14 '14 at 18:29
-
-
Hi Tim, One final help pls. I am getting "I/O error : Unknown IO error " during execution of this program. looks like file operation is failing at $output_doc->toFile($output_file, 1). can you please let me know if you have any idea. – Seshagiri Lekkala Sep 15 '14 at 21:48
-
This could be one of a whole load of things, and really is a new question. I strongly recommend that you start a new question on this site and describe the problem you're having. You can link back to this question and answer to cover some of the detail. This way you should get more attention from people who understand what's going on. In the new question you should mention that you're using Perl and XML::LibXML. Maybe see you at the new one. – Tim Sep 16 '14 at 06:23
-