0

I'm new to Perl, and I need to improve the performance of an application someone else wrote.

Profiling showed that the program is spending a lot of time in the XML::Simple library. Based on knowledge about how the application's use changed over time, we're suspecting that it is re-parsing the same XML data several times over.

Memoizing the XML parsing function seemed like a straightforward fix. The files it gets the XML data from are assumed not to change while the program runs, so let's just cache the results for each file.

Such function, the point-of-entry for the library, is XMLin.

My single change to the software was adding

use Memoize;
memoize('XMLin');

Trying to run in returns the error:

    Not a HASH reference at C:\QuEST\Scripts\RangeAnalyzer/ParseETP.pl line 269.

Line 269 is:

@constantElements = @{$xml->{declarations}->{Package}->{declarations}->{Constant}};
    

... and $xml is defined a few lines up as:

my $xml = XMLin($Filename, KeyAttr => {ConstValue => '', Operator => '', VariableRef => '', Variable => '', StateMachine => '', State => '', IfBlock => '', WhenBlock => '', SizeParameter => ''}, ForceArray => ['Variable', 'ConstValue', 'DataArrayOp', 'Constant']);

Undoing the change fixes the error.

Why did memoizing the function break its return value? How to fix it?

I noticed XML::Simple is deprecated, and replacing it, preferrably with something faster, is in the list of things to try. Nevertheless, this error broke my mental model of how memoization was supposed to work.

I'm using Perl 5.10.0.

Emilio M Bumachar
  • 2,532
  • 3
  • 26
  • 30
  • Could you please provide a minimal, runable demonstration of the problem? – ikegami Aug 19 '20 at 20:17
  • Also, perhaps you could show what `Data::Dumper` says `$xml` contains the line before it errors out? – A Gold Man Aug 19 '20 at 20:53
  • 1
    Not sure what's breaking your code, but even if memoize did work, it's worth pointing out that the value you're passing for the `KeyAttr` option is using the anonymous hashref constructor `{...}` which will return a reference to a different, newly constructed, hashref on each execution. So the memoised function would see different argument values on each call and not return a value from the cache. – Grant McLean Aug 19 '20 at 21:35
  • @A Gold Man, Re "*Also, perhaps you could show what Data::Dumper says $xml contains*", Given a suitable demonstration, we could do that ourselves! – ikegami Aug 19 '20 at 21:44
  • 2
    I suspect the error has nothing to do with `memoize`, and simply because of code that doesn't match the XML. [XML::Simple is the most complicated XML parser to use](https://stackoverflow.com/a/33273488/589924). By far. Don't use it! – ikegami Aug 19 '20 at 21:49
  • @ikegami granted, but it's 1) easier to produce helpful output, seeing as the OP doesn't need to flit through lines that may or may not be relevant (in a code base that they are just learning) and 2) it's a good habit to get into – A Gold Man Aug 19 '20 at 21:50
  • 1
    @A Gold Man, Re "*the OP doesn't need to flit through lines that may or may not be relevant*", yes, they do. It's not our "job" to go through the huge dump of irrelevant data. They DO need to produce a minimal, runable demonstration of the problem. Furthermore, it's trivial to chop down an XML file to the bit that contains the elements in the "query", and they already provided the code they use to load up the XML file. – ikegami Aug 19 '20 at 21:50

1 Answers1

2

I'm afraid that there isn't enough information in your question to fully answer what's going wrong. (Not until there's an MWE at least). However, I would like to point out two things which you may need to consider.

In order to memoize a function, Memoize uses a normalizer to check if the arguments are the same. As per the docs, this by default will just stringify the arguments. This means that the hashref gets turned into its string representation, which is its location in memory. This will change between invocations of the function, and as such it will never correctly identify that you've passed the same args.

You may want to supply your own normalizing function to address the particular argument style that XML::Simple requires.

In addition, as per the caveats section in the docs, if your function returns a reference, then the same reference gets returned. This means that if at some point you modify the structure (which I have no way of knowing if that happens given the information given), then that modified structure will be returned later.

A Gold Man
  • 198
  • 2
  • 7
  • If they always pass a file name and the same options, the custom normalizing function could simply return the first argument (the file name). If they always pass a file name, but not always the same options, using `Cpanel::JSON::XS->new->canonical->encode(\@_)` should do the trick. – ikegami Aug 19 '20 at 21:46