0

I have an XML that I need to convert to a hash in a specific format that requires some nodes to be in an array. I've tried XML::Simple but can't get rid of one xml node level.

#!/usr/bin/perl
use Data::Dumper::Simple;
use XML::Simple;

use warnings;
use strict;

my $xml = <<'XML';
<?xml version="1.0"?>
<release id="9999" status="Accepted">
  <images>
    <image height="511" type="primary" uri="" uri150="" width="600"/>
    <image height="519" type="secondary" uri="" uri150="" width="600"/>
    <image height="521" type="secondary" uri="" uri150="" width="600"/>
    <image height="217" type="secondary" uri="" uri150="" width="500"/>
    <image height="597" type="secondary" uri="" uri150="" width="600"/>
    <image height="89" type="secondary" uri="" uri150="" width="600"/>
  </images>
  <artists>
    <artist>
      <id>45</id>
      <name>Aphex Twin</name>
      <anv/>
      <join/>
      <role/>
      <tracks/>
    </artist>
  </artists>
</release>
XML

my $xml_hash = XMLin($xml, ForceArray => qr{image}x );
print Dumper $xml_hash; 

Desired output

       'images' => [
                     {
                       'type' => 'primary',
                       'width' => 600,
                       'resource_url' => '',
                       'uri150' => '',
                       'height' => 511,
                       'uri' => ''
                     },
                     {
                       'width' => 600,
                       'type' => 'secondary',
                       'resource_url' => '',
                       'uri150' => '',
                       'uri' => '',
                       'height' => 519
                     }, etc...

What I'm getting with my sample code is

$xml_hash = {
              'images' => [
                            {
                              'image' => [
                                           {
                                             'uri150' => '',
                                             'type' => 'primary',
                                             'uri' => '',
                                             'height' => '511',
                                             'width' => '600'
                                           },
                                           {
                                             'type' => 'secondary',
                                             'uri150' => '',
                                             'uri' => '',
                                             'height' => '519',
                                             'width' => '600'
                                           },
                                           {
                                             'uri' => '',
                                             'height' => '521',
                                             'width' => '600',
                                             'type' => 'secondary',
                                             'uri150' => ''
                                           },
                              etc...

How do I get rid of

'image' => [

and have

'images' => [

contain all the hashes ?

Thanks; George

George
  • 1

2 Answers2

2

Any attempt to represent a whole XML document as a Perl data structure will be fraught with edge cases and inconvenient designs by the nature of the two formats. There are many options to parse and traverse XML in a way suited to the format, like XML::LibXML and XML::Twig. Here is how I would approach this with Mojo::DOM (which uses CSS selectors for traversal):

use strict;
use warnings;
use Mojo::DOM;
use Mojo::Util 'dumper';

my $xml = <<'XML';
<?xml version="1.0"?>
<release id="9999" status="Accepted">
  <images>
    <image height="511" type="primary" uri="" uri150="" width="600"/>
    <image height="519" type="secondary" uri="" uri150="" width="600"/>
    <image height="521" type="secondary" uri="" uri150="" width="600"/>
    <image height="217" type="secondary" uri="" uri150="" width="500"/>
    <image height="597" type="secondary" uri="" uri150="" width="600"/>
    <image height="89" type="secondary" uri="" uri150="" width="600"/>
  </images>
  <artists>
    <artist>
      <id>45</id>
      <name>Aphex Twin</name>
      <anv/>
      <join/>
      <role/>
      <tracks/>
    </artist>
  </artists>
</release>
XML

my $dom = Mojo::DOM->new->xml(1)->parse($xml);
my @images = $dom->find('release#9999 > images > image')->map('attr')->each;
print dumper \@images;

Output:

[
  {
    "height" => 511,
    "type" => "primary",
    "uri" => "",
    "uri150" => "",
    "width" => 600
  },
  {
    "height" => 519,
    "type" => "secondary",
    "uri" => "",
    "uri150" => "",
    "width" => 600
  },
  {
    "height" => 521,
    "type" => "secondary",
    "uri" => "",
    "uri150" => "",
    "width" => 600
  },
  {
    "height" => 217,
    "type" => "secondary",
    "uri" => "",
    "uri150" => "",
    "width" => 500
  },
  {
    "height" => 597,
    "type" => "secondary",
    "uri" => "",
    "uri150" => "",
    "width" => 600
  },
  {
    "height" => 89,
    "type" => "secondary",
    "uri" => "",
    "uri150" => "",
    "width" => 600
  }
]
Grinnz
  • 9,093
  • 11
  • 18
  • I hoped for magic bullet, but I get your point, what I posted is just a sample of xml, there are many cases where I would need to use Mojo to extract what I need into specific hash format, I already have bunch of code that expect the hash as from web service. Trying to work with xml data dump instead of web service. Thanks ! – George Jan 24 '20 at 20:50
  • @George It will take work to adapt various XML into the structures you need, but in the end it will be more consistent, flexible and maintainable. – Grinnz Jan 24 '20 at 21:14
  • hum, If a prefix is used for an attribute, this creates a hash element that uses the prefix, even though the prefix is meaningless outside of the document. – ikegami Jan 24 '20 at 22:33
2

XML::Simple discourages its own use.

Here's how you can get the array of hashes using XML::LibXML:

use XML::LibXML;

my $dom = XML::LibXML->load_xml(string => $xml);

my @images = map +{
    map { $_->name => $_->value } $_->findnodes('@*')
}, $dom->findnodes('/release/images/image');
choroba
  • 231,213
  • 25
  • 204
  • 289
  • This ignores the namespace of the attribute, so it could cause conflicts. Would be better to ignore attributes in the non-null namespace so that extensions --the "X" in "XML"-- don't cause problems. But it's not likely to matter. – ikegami Jan 24 '20 at 22:38
  • @ikegami: Yes, and if the XML declares a namespace for the root node, the resulting array will be empty. – choroba Jan 24 '20 at 22:59
  • As it should. That's the point I was making. You treat nodes `{}release` and nodes `{http://foo}release` as different elements (which is good), but you treat attributes `{}height` and `{http://foo}height` as the same, picking one at random as the height (which is bad, but unlikely to happen). – ikegami Jan 26 '20 at 08:58