0

I have a well formed XML that I'm trying to convert to a hash with perl module XML::Simple. There are some section in this file which cannot be parsed correctly. Is there any way (or a workaround) to get the xml parsed correctly and get the desired result?

D:\tmp>perl parse_dns2.pl dns_problem_public.xml
Warning: <dns_entry> element has non-unique value in 'domain' key attribute: 
0 at parse_dns2.pl line 9.
Warning: <dns_entry> element has non-unique value in 'domain' key attribute: 
example.com at parse_dns2.pl line 9.
Warning: <dns_entry> element has non-unique value in 'domain' key attribute: 
test.com at parse_dns2.pl line 9.
$VAR1 = {
  'dns_timeout' => '20',
  'local_dns' => {
    'dns_entry' => {
      '0' => {
        'content' => '192.168.120.32'
      },
      'domain.example.com' => {
        'content' => '172.16.113.13'
      },
      'example.com' => {
        'content' => '172.16.113.13'
      },
      'test.com' => {
        'content' => '172.17.0.113'
      }
    }
  }
};

My code is strightforward:

#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
use XML::Simple;
use Data::Dumper;

my $ref = XMLin(
    $ARGV[0],
    ForceArray => ['dns_entry'],
    KeyAttr    => { 'dns_entry' => 'priority' },
    KeyAttr    => { 'dns_entry' => 'domain' },
    ForceContent => 0
);

print Dumper $ref;

The xml file (the relevant section) contain attributes which I need to use as a key:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE config SYSTEM "config.dtd"> 
<dns>
    <local_dns>
        <dns_entry priority="0">192.168.120.31</dns_entry>
        <dns_entry priority="0">192.168.120.32</dns_entry>
        <dns_entry domain="example.com">172.16.103.20</dns_entry>
        <dns_entry domain="example.com">172.16.113.13</dns_entry>
        <dns_entry domain="test.com">172.17.0.111</dns_entry>
        <dns_entry domain="test.com">172.17.0.113</dns_entry>
        <dns_entry domain="domain.example.com">172.16.103.20</dns_entry>
        <dns_entry domain="domain.example.com">172.16.113.13</dns_entry>
    </local_dns>
    <dns_timeout>20</dns_timeout>
</dns>

The first problem that XML::Simple cannot accept similar elements with the same attributes (although different values). And the second problem is that I can use only one attribute as a key attribute in the same XML block.

Desired result:

$VAR1 = {
  'local_dns' => {
    'dns_entry' => {
      'domain' => {
        'domain.example.com' => {
          'content' => [
            '172.16.113.20',
            '172.16.113.13'
          ]
        },
        'example.com' => {
          'content' => [
            '172.16.113.20',
            '172.16.113.13'
          ]
        },
        'test.com' => {
          'content' => [
            '172.17.0.111',
            '172.17.0.111'
          ]
        }
      },
      'priority' => {
        '0' => {
          'content' => [
            '192.168.120.31',
            '192.168.120.32'
          ]
        }
      }
    }
  },
  'dns_timeout' => '20'
};
ikegami
  • 367,544
  • 15
  • 269
  • 518
PaulP
  • 3
  • 4
  • Re "*Is there any way (or a workaround) to get the xml parsed correctly and get the desired result?*", You didn't specify what the desired result is. – ikegami Sep 23 '18 at 13:37
  • @ikegami: thank you, I've added a section with the desired results. Basically a hash with all information included so any element can be addressed using a hash expression. – PaulP Sep 23 '18 at 17:34
  • Are you sure that's what you want as a result? I mean, if you use `findnodes` you can address any element using an `xpath` expression instead, and you don't have to faff around with mangling your XML. If you want to re-[ask] your question, and frame it around the higher level problem, I'm sure we can supply you with even better answers. – Sobrique Sep 24 '18 at 11:50
  • @Sobrique: I'll use libxml with findnodes for some sections of my xml file. The problem with libxml that I don't know in advance the structure of the xml file and need "walk" through it instead. – PaulP Sep 24 '18 at 13:29
  • My point is that you probably _don't_ need to know the structure of the file - just what you're looking for. And if you really need to iterate that, you can. But you probably don't need to do that either. – Sobrique Sep 24 '18 at 15:10

1 Answers1

3

Nodes can;'t have multiple contents, so some transformation is needed.

You should take this opportunity to avoid using the most complicated of the XML parsers out there. It's so hard to use its own documentation advises against using it.

Here's an XML::LibXML solution:

use XML::LibXML qw( );

my $doc = XML::LibXML->new->parse_file('dns.xml');

my %data;
{
   $data{dns_timeout} = $doc->findvalue('/dns/dns_timeout/text()');

   for my $dns_entry_node ($doc->findnodes('/dns/local_dns/dns_entry')) {
      my $addr = $dns_entry_node->textContent();

      if (defined( my $priority = $dns_entry_node->getAttribute('priority') )) {
         push @{ $data{local_dns}{dns_entry}{priority}{$priority} }, $addr;
      }

      if (defined( my $domain = $dns_entry_node->getAttribute('domain') )) {
         push @{ $data{local_dns}{dns_entry}{domain}{$domain} }, $addr;
      }
   }
}
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • agree, but XML::Simple the only way to parse a xml file with unknown structure, by creating a hash and "walking" through it. I think I have to combine both XML::Simple and LibXML ways in my script. Thank you anyway for a clarifying that XML::Simple is a dead end in my situation :-) – PaulP Sep 24 '18 at 06:24
  • No, it's not. All XML parsers can handle unknown structure and 'walk' it, just the way you want. E.g. `XML::LibXML` does `childNodes` and `XML::Twig` does `children`. Of course, if you don't actually _want_ to walk the `XML` at all, but instead find a particular key/value somewhere in the structure, that's where you can use `findnodes` and have a _much_ simpler_ answer. – Sobrique Sep 24 '18 at 11:49