5

I am using WWW::Mechanize and currently handling HTTP responses with the 'Content-Encoding: gzip' header in my code by first checking the response headers and then using IO::Uncompress::Gunzip to get the uncompressed content.

However I would like to do this transparently so that WWW::Mechanize methods like form(), links() etc work on and parse the uncompressed content. Since WWW::Mechanize is a sub-class of LWP::UserAgent, I would prefer to use the LWP::UA::handlers to do this.

While I have been partly successful (I can print the uncompressed content for example), I am unable to do this transparently in a way that I can call

$mech->forms();

In summary: How do I "replace" the content inside the $mech object so that from that point onwards, all WWW::Mechanize methods work as if the Content-Encoding never happened?

I would appreciate your attention and help. Thanks

szabgab
  • 6,202
  • 11
  • 50
  • 64
Gurunandan Bhat
  • 3,544
  • 3
  • 31
  • 43

3 Answers3

8

WWW::Mechanize::GZip, I think.

titanofold
  • 2,852
  • 1
  • 15
  • 21
Fayland Lam
  • 1,016
  • 8
  • 11
3

It looks to me like you can replace it by using the $res->content( $bytes ) member.

By the way, I found this stuff by looking at the source of LWP::UserAgent, then HTTP::Response, then HTTP::Message.

szabgab
  • 6,202
  • 11
  • 50
  • 64
jettero
  • 835
  • 2
  • 13
  • 26
  • Yes - it works. Thanks. Will use it when I want to do more than gunzip content. For now I'll just use the module suggested by Fayland – Gurunandan Bhat May 17 '09 at 11:41
  • Be careful, WWW::Mechanize::GZip looks being quite buggy (see http://stackoverflow.com/questions/6874076/perl-how-to-avoid-diagnostic-messages-from-not-directly-included-modules). Sorry I do not fully understand the replace method you're speaking about: can you give some example code, please? – MarcoS Aug 01 '11 at 15:44
  • @jettero: Did you mean "$res->decoded_content()"? In any case, I voted your answer up because I didn't even think to check for that. So I found it when I searched for "Encoding" in [perldoc HTTP::Response](http://search.cpan.org/perldoc?HTTP::Response). Thanks! – Michael Krebs Mar 12 '12 at 10:21
0

It is built in with UserAgent and thus Mechanize. One MAJOR caveat to save you some hair

-To debug, make sure you check for error $@ after the call to decoded_content.

$html = $r->decoded_content;
die $@ if $@;

Better yet, look through the source of HTTP::Message and make sure all the support packages are there

In my case, decoded_content returned undef while content is raw binary, and I went on a wild goose chase. UserAgent will set the error flag on failure to decode, but Mechanize will just ignore it (It doesn't check or log the incidence as its own error/warning).

In my case $@ sez: "Can't find IO/HTML.pm .. It was eval'ed

After having to dive into the source, I find out the built-in decoding process is long, meticulous, and arduous, covering just about every scenario and making tons of guesses (Thank you Gisle!).

if you are paranoid, explicitly set the default header to be used with every request at new()

    $browser = new WWW::Mechanize('default_headers' => HTTP::Headers->new('Accept-Encoding' 
                            => scalar HTTP::Message::decodable()));
user2695439
  • 11
  • 1
  • 2