2

In a small test file, I can run

#!/usr/bin/perl
use warnings;
use strict;
use open qw{:utf8 :std};
use XML::Simple;

my @cmdline = ("hg", "log", "-v", "--style", "xml");
open my $xml, "@cmdline |";

my $xmllog = XMLin($xml, ForceArray => ['logentry', 'parent', 'copy', 'path']);

foreach my $rev (@{$xmllog->{logentry}}) {
    #do stuff
}

and it works fine. When I run the same code in a larger program (with the same XML input), it terminates with

*** glibc detected *** /usr/bin/perl: malloc(): memory corruption: 0x0a40e308 ***

(full crash log @ pastebin.com)

However, if I do the exchange

#open my $xml, "@cmdline |";
my $xml = `@cmdline`;

then it works (in both files), so this is more a question of curiosity than a real problem for me.

  1. Does anyone have any pointers on what the difference between my test case and the larger code base might be?
  2. Is there a speed/memory/? difference in the different command calls? Best practices?

Debian Sid: Perl 5.12.4-1.

(This is my first Perl encounter, so don't assume too much about what I "should" know about the language. I just dove into existing code.)

(The larger program is ikiwiki, so the code is not a secret, but I don't know where to look for trouble, and I can't include all the code in this post for practical reasons. This concerns the Mercurial backend.)


As per suggestion from cjm, I added print "$_\n" for sort grep /XML/, keys %INC; which gave output

RPC/XML.pm
RPC/XML/Client.pm
RPC/XML/ParserFactory.pm
XML/NamespaceSupport.pm
XML/Parser.pm
XML/Parser/Expat.pm
XML/SAX.pm
XML/SAX/Base.pm
XML/SAX/Exception.pm
XML/SAX/Expat.pm
XML/SAX/ParserFactory.pm
XML/Simple.pm

in the large project, and

XML/NamespaceSupport.pm
XML/Parser.pm
XML/Parser/Expat.pm
XML/SAX.pm
XML/SAX/Base.pm
XML/SAX/Exception.pm
XML/SAX/Expat.pm
XML/SAX/ParserFactory.pm
XML/Simple.pm

in the test file.


Update: I installed the Debian package libxml-libxml-perl and added $XML::SAX::ParserPackage = "XML::LibXML::SAX"; as suggested. This also crashed, with a different message this time:

*** stack smashing detected ***: /usr/bin/perl terminated

full backtrace @ pastebin.com

This time it happened consistently in both the large and the small file, though. Also, only when using open, not when using backticks.

I also installed libxml-libxml-simple-perl, but that was not supposed to be more than in practice a wrapper to always use XML::LibXML as parser. It also behaved differently and complained about the options to XMLin() that was set, so I discarded it.

Trying to explicitly (and blindly) make the program use each of the alternatives given by print "$_\n" for sort grep /XML/, keys %INC; seems to point towards that XML::SAX::Expat is used by default as cjm said (since all other alternatives exit with errors, and XML::SAX:Expat behaves exactly like the original problem in both files. Explicitly demanding XML::Simple goes into a loop that allocates all my memory).

I'm thankful for learning about different XML parsers and that XML::Simple automatically chooses different ones. Both parts of my original question somewhat remain though:

  1. Why do the programs behave differently? Even if I explicitly set $XML::SAX::ParserPackage = "XML::SAX::Expat" in both programs, one crashes (using open) and the other works.
  2. Should I use another method to receive output from the external command? Is it even wrong to expect XMLin() ta work with open (but why does it work in one case, then?)?

Or are they simple the "wrong" questions to ask (i.e. irrelevant)?


UPDATE: More than a week has passed, not a flurry of activity here, and I solve it a bit differently now, without problems. I mark cjm's answer as correct, since it got me further in the error analysis. Thanks!

Helgi
  • 5,428
  • 1
  • 31
  • 48
Daniel Andersson
  • 1,614
  • 1
  • 14
  • 23
  • Why "use open"? XML is not utf-8 encoded; XML is binary and it's up to the parser to detect the encoding -- that's what ` – jrockway Jul 25 '11 at 02:46
  • Also worth noting: I quickly glanced over the XS code for XML::Parser and noticed that it plays very fast-and-loose with the "utf8 flag"; turning on the flag regardless of whether or not the buffer is valid utf8. Use XML::LibXML :) – jrockway Jul 25 '11 at 02:53
  • "use open" was only to recreate the header that was already in place in the larger program. I wanted to have the environments as equal as possible to isolate the problem, but no, it didn't make any difference with or without it in this case. – Daniel Andersson Jul 25 '11 at 08:55

1 Answers1

5

XML::Simple is pure-Perl, so it's unlikely to cause the memory corruption you report. It depends on a lower-level XML parser, and it's likely the bug you've encountered is in there. But there are multiple parsers it could be using, and we'd need to know which one.

Try adding this line right after the XMLin line in your sample program, and update your question with the results:

print "$_\n" for sort grep /XML/, keys %INC;

This will tell us which XML parser you're actually using on your system.


Update: Since it looks like you're using XML::Parser (through its SAX interface XML::SAX::Expat, I'd suggest trying XML::LibXML::SAX instead. Libxml2 is considered one of the better XML parsers.

If you don't already have XML::LibXML::SAX installed, just installing it should switch your default SAX parser to it. If it is installed, try putting

$XML::SAX::ParserPackage = "XML::LibXML::SAX";

at the beginning of your program. (See XML::SAX::ParserFactory for how the SAX parser is selected.)

cjm
  • 61,471
  • 9
  • 126
  • 175
  • XML::Simple seems to call into XML::SAX::Expat, which is definitely not pure perl. If you look at the stack trace that's posted, you can see that it segfaults in `/usr/lib/libexpat.so.1(XML_ParseBuffer+0x7c)[0xb714464c]`, which is quite expat-looking :) – jrockway Jul 25 '11 at 02:48
  • @jrockway, XML::Simple itself is pure-Perl, but the lower-level XML parser it uses usually isn't. But XML::Simple can use XML::Parser directly, or whatever parser XML::SAX chooses by default. – cjm Jul 25 '11 at 04:26
  • I wish I could give this one a "Nice Answer" badge myself. ;) It correctly identifies that the problem isn't coming from within any of the Perl code (including XML::Simple's code), shows how to identify which XML parser may be to blame, and explains how modules such as XML::Simple rely on external XML parsing libraries. It identified which parser was to blame, and described how to force the use of an alternate and probably higher quality parser. Nice job. – DavidO Jul 25 '11 at 07:06