0

I want to write code in Perl that compares two XML files.

A Little bit from the history... With API Documentation (get request) I get data1 form the Web Service1 and data2 from the Service2. They are presented in XML Format, but not the same.

I should compare just two elements in these files (deviceName and ipAddress), if they are the same in both files, It should be a message " WebService1 already contains DeviceName "Switch1" ". If not - I would make POST request and add this device in WebService1/WebService2.

Can you give me advice, what Modules should I use and how should I begin with this comparing?

For example (file1)

   <?xml version="1.0" ?>
   <queryResponse last="34" first="0" count="35" type="Devices" responseType="listEntityInstances" requestUrl="https://hostname/webacs/api/v1/data/Devices?.full=true" rootUrl="https://hostname/webacs/api/v1/data">
      <entity dtoType="devicesDTO" type="Devices" url="https://hostname/webacs/api/v1/data/Devices/201">
         <devicesDTO displayName="201201" id="201">
           <clearedAlarms>0</clearedAlarms>
           <collectionDetail></collectionDetail>
           <collectionTime></collectionTime>
           <creationTime></creationTime>
           <criticalAlarms>0</criticalAlarms>
           <deviceId>205571</deviceId>
           <deviceName>NEW-SW5</deviceName>
           <deviceType>Cisco Switch</deviceType>
           <informationAlarms>0</informationAlarms>
           <ipAddress>10.66.12.128</ipAddress>
         <location></location>
           <majorAlarms>0</majorAlarms>
           <managementStatus></managementStatus>
              <manufacturerPartNrs>
                  <manufacturerPartNr></manufacturerPartNr>
              </manufacturerPartNrs>
              <minorAlarms>0</minorAlarms>
              <productFamily></productFamily>
              <reachability>Reachable</reachability>
              <softwareType>IOS</softwareType>
              <softwareVersion>12.1(22)</softwareVersion>
              <warningAlarms>0</warningAlarms>
         </devicesDTO>
      </entity>
   </queryResponse>

File2

  <?xml version="1.0" encoding="utf-8" standalone="yes"?>
  <ns3:networkdevice name="NEW-SW5" id="9a6ef750-2620-11e4-81be-b83861d71f95" xmlns:ns2="ers.ise.cisco.com" xmlns:ns3="network.ers.ise.cisco.com">
  <link type="application/xml" href="https://hostname:9060/ers/config/networkdevice/123456" rel="self"/>
       <authenticationSettings>
          <enableKeyWrap>false</enableKeyWrap>
          <keyInputFormat>ASCII</keyInputFormat>
          <networkProtocol>RADIUS</networkProtocol>
          <radiusSharedSecret>******</radiusSharedSecret>
       </authenticationSettings>
       <NetworkDeviceIPList>
         <NetworkDeviceIP>
            <ipaddress>10.66.12.128</ipaddress>
            <mask>21</mask>
         </NetworkDeviceIP>
       </NetworkDeviceIPList>
       <NetworkDeviceGroupList>
         <NetworkDeviceGroup>Location#All Locations</NetworkDeviceGroup>
         <NetworkDeviceGroup>Device Type#All Device Types</NetworkDeviceGroup>
   </NetworkDeviceGroupList>
  </ns3:networkdevice>

There is smth special: In file1 my tags called: deviceName, ipAddress and they are elements.
In file2 we have one attribute (because it is staying in the main element ns3:networkdevice and it's called name what responds our deviceName from file1 ) and the other element is called ipaddress (ipAddress in file1)

simbabque
  • 53,749
  • 8
  • 73
  • 136
StayCalm
  • 145
  • 2
  • 13
  • 1
    You need to parse both responses individually, grab the values you are interested in, and then compare them. If you don't want all the data, but just those two specific values, XML::Twig is a good choice. – simbabque Mar 14 '17 at 14:00

5 Answers5

2

You can use XML::Twig to parse both responses. Each of them needs an individual parser.

For the first one, you need to go for the two tags <deviceName> and <ipAddress>. A simple twig_handler for each of them that access the text property of the matched element is sufficient.

Those handlers can be complex, but in our case a code reference that deals with a single value is enough. We know that there is only one occurrence of each value, so we can directly assign both of them to their respective lexical variables.

use strict;
use warnings;
use XML::Twig;

my ($device_name, $ip_address);
XML::Twig->new(
    twig_handlers => {
        deviceName => sub { $device_name = $_->text },
        ipAddress => sub { $ip_address = $_->text },
    }
)->parse(\*DATA);

say $device_name;
say $ip_address;

__DATA__
<?xml version="1.0" ?>
<queryResponse last="34" first="0" count="35" type="Devices" responseType="listEntityInstances" requestUrl="https://hostname/webacs/api/v1/data/Devices?.full=true" rootUrl="https://hostname/webacs/api/v1/data">
   <entity dtoType="devicesDTO" type="Devices" url="https://hostname/webacs/api/v1/data/Devices/201">
      <devicesDTO displayName="201201" id="201">
        <clearedAlarms>0</clearedAlarms>
        <collectionDetail></collectionDetail>
        <collectionTime></collectionTime>
        <creationTime></creationTime>
        <criticalAlarms>0</criticalAlarms>
        <deviceId>205571</deviceId>
        <deviceName>NEW-SW5</deviceName>
        <deviceType>Cisco Switch</deviceType>
        <informationAlarms>0</informationAlarms>
        <ipAddress>10.66.12.128</ipAddress>
      <location></location>
        <majorAlarms>0</majorAlarms>
        <managementStatus></managementStatus>
           <manufacturerPartNrs>
               <manufacturerPartNr></manufacturerPartNr>
           </manufacturerPartNrs>
           <minorAlarms>0</minorAlarms>
           <productFamily></productFamily>
           <reachability>Reachable</reachability>
           <softwareType>IOS</softwareType>
           <softwareVersion>12.1(22)</softwareVersion>
           <warningAlarms>0</warningAlarms>
      </devicesDTO>
   </entity>
</queryResponse>

For the second one you need to use att() to get the name attribute of one of the elements, but that's also straight-forward.

use strict;
use warnings;
use XML::Twig;

my ($device_name, $ip_address);
XML::Twig->new(
    twig_handlers => {
        'ns3:networkdevice' => sub { $device_name = $_->att('name') },
        ipaddress => sub { $ip_address = $_->text },
    }
)->parse(\*DATA);

say $device_name;
say $ip_address;
__DATA__
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<ns3:networkdevice name="NEW-SW5" id="9a6ef750-2620-11e4-81be-b83861d71f95" xmlns:ns2="ers.ise.cisco.com" xmlns:ns3="network.ers.ise.cisco.com">
<link type="application/xml" href="https://hostname:9060/ers/config/networkdevice/123456" rel="self"/>
     <authenticationSettings>
        <enableKeyWrap>false</enableKeyWrap>
        <keyInputFormat>ASCII</keyInputFormat>
        <networkProtocol>RADIUS</networkProtocol>
        <radiusSharedSecret>******</radiusSharedSecret>
     </authenticationSettings>
     <NetworkDeviceIPList>
       <NetworkDeviceIP>
          <ipaddress>10.66.12.128</ipaddress>
          <mask>21</mask>
       </NetworkDeviceIP>
     </NetworkDeviceIPList>
     <NetworkDeviceGroupList>
       <NetworkDeviceGroup>Location#All Locations</NetworkDeviceGroup>
       <NetworkDeviceGroup>Device Type#All Device Types</NetworkDeviceGroup>
 </NetworkDeviceGroupList>
</ns3:networkdevice>

Now you that you have both of these, you can combine that. I suggest to create a function for each of them, pass in the response XML and make them return the $device_name and $ip_address.

use strict;
use warnings;
use XML::Twig;

sub parse_response_1 {
    my $xml = shift;

    my ( $device_name, $ip_address );
    XML::Twig->new(
        twig_handlers => {
            deviceName => sub { $device_name = $_->text },
            ipAddress  => sub { $ip_address  = $_->text },
        }
    )->parse($xml);

    return $device_name, $ip_address;
}

sub parse_response_2 {
    my $xml = shift;

    my ( $device_name, $ip_address );
    XML::Twig->new(
        twig_handlers => {
            'ns3:networkdevice' => sub { $device_name = $_->att('name') },
            ipaddress           => sub { $ip_address  = $_->text },
        }
    )->parse($xml);

    return $device_name, $ip_address;
}

Of course my names parse_response_1 and parse_response_2 are not the best choice. Don't use the numbers, use the names of the services that returned the responses instead.

With those two functions we now have the means to retrieve exactly the information that we want. All that's left is to check them.

sub check {
    my ( $response_1, $response_2 ) = @_;

    my ( $device_name_1, $ip_address_1 ) = parse_response_1($response_1);
    my ( $device_name_2, $ip_address_2 ) = parse_response_2($response_2);

    return $device_name_1 eq $device_name_2 && $ip_address_1 eq $ip_address_2;
}

Again, the names of the variables could be better. Now you just need to call that with your two response XMLs and it will return a truthy value, or not.

simbabque
  • 53,749
  • 8
  • 73
  • 136
  • Thank you a lot! DATA filr- shoul it be DATA.xml or just DATA? If I write it in DATA.xml `->parse('FILE1.xml'); ` I have error `you seem to have used the parse method on a filename (DATA.xml), you probably want parsefile instead at ParsingXMLDatei1.pl line 11.` If I write just DATA: `Name "main::DATA" used only once: possible typo at ParsingXMLDatei1.pl line 11. not well-formed (invalid token) at line 1, column 4, byte 4 at /usr/lib/x86_64-linux-gnu/perl5/5.20/XML/Parser.pm line 187. at ParsingXMLDatei1.pl line 11.` – StayCalm Mar 14 '17 at 15:39
  • You want `parsefile('FILE1.xml')` - `__DATA__` is for inlining the code for illustration purposes. – Sobrique Mar 14 '17 at 15:43
  • 1
    I think I'd do the above, but might go with a parse, and then `get_xpath` than `twig_handlers`. – Sobrique Mar 14 '17 at 15:44
  • I wrote `parsefile('FILE1.xml');` I think it works, I have other error, but I'll try to fix it by myself.... – StayCalm Mar 14 '17 at 15:50
  • How should I assign my FILE1. xml or FILE2.xml this variable `my $xml` ? – StayCalm Mar 14 '17 at 16:15
  • @StayCalm are you getting those two results from? Do you have them in the file system already? Or are they in the same program, and came back from some call with LWP? All my code examples stand alone, they are not a complete program. They are meant to illustrate a concept. You'll always have to adapt them to your exact situation. – simbabque Mar 14 '17 at 16:23
  • @Sobrique I keep forgetting about that. But I also have to look up the syntax for `twig_handlers` every time because I never use XML::Twig productively myself. You're right, `get_xpath` would be more concise. – simbabque Mar 14 '17 at 16:24
  • Thought I'd offer up an example for the sake of comparison. And y'know, I can't resist an opportunity for XML hackery :) – Sobrique Mar 14 '17 at 17:32
2

Much like simbaque I'd use XML::Twig, although I'd tackle it slightly differently - I'm offer this up for the sake of comparison - rather than using twig_handlers - which I'd call a powerful and useful technique, but particularly suitable for incremental parsing larger XML - something that uses get_xpath to look for xpath based references within the XML might provide an alternative.

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig;

my $xml1 = XML::Twig->new->parsefile('test1a.xml');
my $xml2 = XML::Twig->new->parsefile('test1b.xml');

if ( $xml1->get_xpath( '//deviceName', 0 )->text 
  eq $xml2->root->att('name') )
{
   print "Name matches\n";
}

if ( $xml1->get_xpath( '//ipAddress', 0 )->text 
  eq $xml2->get_xpath( '//ipaddress', 0 )->text )
{
   print "IP matches\n";
}

We parse both files into an XML::Twig object, and then use get_xpath to look up the node location. // means anywhere in tree, and the 0 refers to which instance (e.g. the first, only).

Ideally we might do some xpath strings to compare directly though - we can't here, because the 'name' attribute is an attribute of the root node (and one of the limitations of the XML::Twig xpath engine is you can't directly select attribute content).

But with XML::LibXML - which is more fully featured, at a cost of a somewhat steeper learning curve. I wouldn't use it generally but in this specific case it can handle the xpath expression to select an attribute of the root node.

So that would be something like:

#!/usr/bin/env perl
use strict;
use warnings;

use XML::LibXML;

my %compare = (
   '//deviceName' => '//@name',
   '//ipAddress'  => '//ipaddress'
);

my $search1 = XML::LibXML::XPathContext->new(
                 XML::LibXML->load_xml( location => 'test1a.xml' ) );
my $search2 = XML::LibXML::XPathContext->new(
                 XML::LibXML->load_xml( location => 'test1b.xml' ) );

foreach my $key ( keys %compare ) {
   my $first  = $search1->find($key);
   my $second = $search2->find( $compare{$key} );

   print "$key = $first\n";
   print "$compare{$key} = $second\n";
   print "Matches found\n" if $first eq $second;
}
Community
  • 1
  • 1
Sobrique
  • 52,974
  • 7
  • 60
  • 101
  • Thank you a lot. It works! I have just one question: How can I change my elements in FILE2? For example, I've compared this two files and elements ipaddress and Name are not the same, that's why I want to have ipAddress and Name in FILE2 from the FILE1 because after that I make POST request with FILE2 (with the data from FILE1). I tried smth like this : `$xml2->root->att('name') = $xml1->get_xpath( '//deviceName', 0 )->text ; $xml2->get_xpath( '//ipaddress', 0 )->text = $xml1->get_xpath( '//ipAddress', 0 )->text ;` But I don't think that that is right... – StayCalm Mar 15 '17 at 14:09
  • You want `set_text` or `set_att` methods. And then you'll need to use `$xml2->print` or `$xml2->sprint`. Maybe with `$xml->set_pretty_print('indented_a')` or similar, for better formatting. – Sobrique Mar 15 '17 at 14:16
  • one little question...if in file1 I have more that one device? How can I compare These files? For example there are 3 devices (It means I should check IP Address from file 2 three times.) I tied already `if (( $xml1->get_xpath( '//ipAddress', 0 )->text eq $xml2->get_xpath( '//ipAddress', 0 )->text ) for $xml2->findnodes('//ipAddress'));` for tag IpAddress and `$xml2->findnodes( './ipAddress/*');` or is that totaly wrong? – StayCalm Mar 17 '17 at 12:36
  • I can post new question if you want – StayCalm Mar 17 '17 at 12:37
1

This isn't a simple task to write from scratch. You should make use of XML::Compare

Borodin
  • 126,100
  • 9
  • 70
  • 144
  • I disagree. If all they need to do is compare two values, then this is pretty trivial, given they can write parsers for both formats. Again, that's also a fairly trivial exercise. XML::Compare, on the other hand, is not really of much here in my opinion, as the two documents have completely different structures. I believe you misread the question. :) – simbabque Mar 14 '17 at 14:24
1
use XML::Simple;
use Data::Dumper;

my $file1_ref = XMLin("./file1");
my $file2_ref = XMLin("./file2");

if($file2_ref->{NetworkDeviceIPList}->{NetworkDeviceIP}->{ipaddress} eq $file1_ref->{entity}->{devicesDTO}->{ipAddress} && $file2_ref->{name} eq $file1_ref->{entity}->{devicesDTO}->{deviceName}) {
  print "WebService1 already contains DeviceName \"".$file2_ref->{name}."\"\n";
} else {
  # POST request and add this device in WebService1/WebService2
  # Code here ....                                                                                                                                                                                                                                                              
}

You can turn the calls into methods and I would strongly suggest that you add and eval around the conversion and check for errors just in case the returned XML is buggy

djtsheps
  • 107
  • 5
  • @StayCalm converting the xml to perl data is the easiest and allows you easy access to all the variables within the XML body – djtsheps Mar 14 '17 at 14:52
  • 1
    Please don't use XML::Simple in production code. It's not simple. Its author [discourages its use](https://metacpan.org/pod/XML::Simple#STATUS-OF-THIS-MODULE), and the data structures that come out are not predictable, and hard to work with. – simbabque Mar 14 '17 at 15:11
0

First note that there is no universal agreement on what it means for two XML files to be "the same". For example, everyone agrees that whitespace within start and end tags should be ignored, and that the distinction between single and double quotes around attributes is irrelevant, and that attributes can be in any order; but requirements vary on how to handle comments, whitespace between element tags, namespace prefixes, and numerous other details.

Another area where requirements vary is what information you want when documents are deemed different. Some mechanisms will only give you a yes-or-no answer, and won't help you find the differences.

This has the consequence that there may be general-purpose solutions out there, but they might not always meet your specific requirements.

So writing your own comparator isn't a ridiculous idea if you're prepared to write a few hundred lines of code.

But two off-the-shelf solutions you could consider, if you can find examples that run in the Perl environment, are:

  • XML canonicalizers: canonicalize both documents and then compare the results at the binary level.

  • XPath 2.0: offers the function deep-equal() to compare two nodes (including document nodes)

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • While this is a nice read and solid advice, it doesn't answer the question. The OP only wants to compare a few values inside of two completely differently structured documents. – simbabque Mar 14 '17 at 17:47