0

I have a file on a server that I want to parse in Perl. I've tried it with XML:Simple and XML:LibXML and I can't get the xml elements in both cases.

This is my .xml file:

<csixml version="1.0">
    <head>
    <details>
        <name-link>linkName</name-link>
        <table>links</table>
        <model>XS1-556</model>        
    </details>
        <fields>
            <field name="name1" />
            <field name="name2"/>
            <field name="name3"/>
            <field name="name4"/>
            <field name="name5"/>
            <field name="name6" />
            <field name="name7"/>
            <field name="name8"/>
            <field name="name9"/>
            <field name="name10"/>
            <field name="name11"/>
            <field name="name12x"/>
            <field name="name13"/>
            <field name="name14"/>
            <field name="name15"/>
            <field name="name16"/>
            <field name="name17"/>
        </fields>
    </head>
    <data>
        <record time="2017/06/01 00:00:00" no="742">
        <v1>14.85</v1>
        <v2>34.1</v2>
        <v3>600</v3>
        <v4>0</v4>
        <v5>0</v5>
        <v6>0</v6>
        <v7>0</v7>
        <v8>11.22</v8>
        <v9>0.41</v9>
        <v10>215</v10>
        <v11>7.043</v11>
        <v12>1.325</v12>
        <v13>2017-05-31T23:47:14</v13>
        <v14>202.3</v14>
        <v15>0</v15>
        <v16>42.85</v16>
        <v17>12.25</v17>
        </record>
        </data>
    </csixml>

And this is the code :

my $parser = new XML::Simple;
$data = $parser->XMLin( get( $url ));

#print Dumper($data);

print $data->{'r'}[0]{'v1'};
print $data->{'r'}[1]{'v2'};    

When I try it with XML:LibXML it gives me an error that says:

Start tag expected, '<' not found
dbz
  • 411
  • 7
  • 22

3 Answers3

2

XML::Simple is flaky and should not be used (even the author agrees) but, having said that, it's a relatively simple fix to get your program working as expected.

You're walking your data structure incorrectly. You need to take a closer look at your Data:Dumper output. Your $data variable is equivalent to the top-level <csixml> tag. Everything else is hashes within that. So, to get to the piece of the data structure you want, you need:

print $data->{data}{r}{v1}
print $data->{data}{r}{v2}

I also see that you're using the "indirect object notation" (new XML::Simple) to create your parser object. This usually works fine, but when it doesn't you'll waste days trying to work out what has gone wrong. Instead, please use the standard syntax - XML::Simple->new.

Update: Here's the code I was using:

#!/usr/bin/perl

use strict;
use warnings;

use Path::Tiny;
use XML::Simple;
use Data::Dumper;

my $file = 'test.xml';
my $xml  = path($file)->slurp;

my $parser = new XML::Simple;
my $data = $parser->XMLin($xml);

#print Dumper($data);

print $data->{data}{'r'}{'v1'};
print $data->{data}{'r'}{'v2'};
Dave Cross
  • 68,119
  • 3
  • 51
  • 97
  • First of all thanks for your coment @Dave Cross, I've just use your code and it gives me an error: `Not a HASH reference at C:\Users\dbz\test.pl line 127` – dbz Jul 04 '17 at 15:56
  • 1
    @dbz: And I guess that neatly demonstrates the problems with using XML::Simple. I'll add my complete code into my answer in a second, but when I run it I get a data structure that is all hash references. You, apparently, get something else. If it helps at all, I'm using XML::Simple version 2.22 on Ubuntu 17.04. – Dave Cross Jul 04 '17 at 16:07
0

XML::LibXML is likely complaining about some slightly broken XML. The XML spec is strict, and says amongst other things - errors are fatal. But it works:

#!/usr/bin/env perl
use strict;
use warnings;
use XML::LibXML;

my $doc = XML::LibXML->load_xml ( IO => \*DATA );
foreach my $node ( $doc -> findnodes ( '//record/v2' ) ) {
   print $node -> textContent;
}

__DATA__
<csixml version="1.0">
    <head>
    <details>
        <name-link>linkName</name-link>
        <table>links</table>
        <model>XS1-556</model>        
    </details>
        <fields>
            <field name="name1" />
            <field name="name2"/>
            <field name="name3"/>
            <field name="name4"/>
            <field name="name5"/>
            <field name="name6" />
            <field name="name7"/>
            <field name="name8"/>
            <field name="name9"/>
            <field name="name10"/>
            <field name="name11"/>
            <field name="name12x"/>
            <field name="name13"/>
            <field name="name14"/>
            <field name="name15"/>
            <field name="name16"/>
            <field name="name17"/>
        </fields>
    </head>
    <data>
        <record time="2017/06/01 00:00:00" no="742">
        <v1>14.85</v1>
        <v2>34.1</v2>
        <v3>600</v3>
        <v4>0</v4>
        <v5>0</v5>
        <v6>0</v6>
        <v7>0</v7>
        <v8>11.22</v8>
        <v9>0.41</v9>
        <v10>215</v10>
        <v11>7.043</v11>
        <v12>1.325</v12>
        <v13>2017-05-31T23:47:14</v13>
        <v14>202.3</v14>
        <v15>0</v15>
        <v16>42.85</v16>
        <v17>12.25</v17>
        </record>
        </data>
    </csixml>

XML::LibXML supports xpath which is invaluable for the sort of thing you're trying to do - you can either specify a full path in the document, or use // to indicate 'anywhere in document'.

So either:

/csixml/data/record/v2

Or:

//record/v2

Will find the value you want.

But can also do other useful things like:

foreach my $node ( $doc -> findnodes ( '//record/*[string()="34.1"]' ) ) {
   print $node -> nodeName;
}

So I think the problem at heart here, is that you're loading your XML incorrectly. It certainly works in the example above (IO => \*DATA loads from the special inline DATA filehandle, but it works fine for your example).

Sobrique
  • 52,974
  • 7
  • 60
  • 101
0

I've tryed all of thoose solutions but finally I found it:

my ($_xml) = new XML::Simple (KeyAttr=>[]);

    my $url = 'http://www.example.com';
    my $agent = LWP::UserAgent->new;
    my $request = HTTP::Request->new(GET => $url);
    $request->content_type('application/xml');
    my $response = $agent->request($request);

                if ($response->is_success) {
                     print "HTTP response is good\n";

                    my ($_message) = $response->decoded_content;
                    my ($_data) = $_xml->XMLin($_message,ForceArray => 1);  


                    foreach my $_e (@{$_data->{data}})
                    {   
                        foreach my $_r (@{$_e->{r}})
                        {

                        print $_r->{time}.": ".$_r->{no}."\n"; 

                        }                                                   
                    }

                } else {

                die "Awooga! HTTP request failed with ". $response->status_line;

                }

Finally I use XML:Simple and I get my xml elements with: $_r->{time} and it works great. I hope this will help someone, thanks to everyone!!

dbz
  • 411
  • 7
  • 22