2

Trying to automate some site crawling and especially file downloading using perl & Firefox::Marionette.

This is the short example code (for the file download).

#!/usr/bin/env perl

use 5.014;
use warnings;

use Firefox::Marionette();
use Path::Tiny;   

my $ff = Firefox::Marionette->new();
#my $ff = Firefox::Marionette->new(
#    visible => 1
#);

#my $dwl = 'https://www.curseforge.com/wow/addons/dazaralor-totems/download/2610166/file'; # direct download link (not work correctly)
my $dwl = 'https://www.curseforge.com/wow/addons/dazaralor-totems/download/2610166';     # download link, leading to redirect page
$ff->go($dwl);

while(!$ff->downloads()) { sleep 1 }
while($ff->downloading()) { sleep 1 }
foreach my $p ($ff->downloads()) {
    say $p;
    path($p)->copy('./toto.zip');
}
$ff->quit;

Running the script, it hangs. So, tried the visible => 1 to get real window, and the script hangs because waiting for confirmation of the open/save dialog as in the bellow picture:

enter image description here

After clicking OK, the file is downloaded.

The question is, how to bypass the confirmation dialog, to be possible to run the script in headless mode without manual clicking.

Also, any other method how to download files from the above site is welcomed, it is behind cloudflare so i failed to using some basic LWP.

kobame
  • 5,766
  • 3
  • 31
  • 62
  • 1
    When I run this (Ubuntu 19.10) I get the dialog but the "Open with" alternative is selected (instead of "Save File"), so I need to manually click "Save File" and then "OK" still after that the "go()" method does not return before I manually close the firefox window. – Håkon Hægland Apr 13 '20 at 15:50
  • @HåkonHægland ++ You're correct, my falut. Using the second `$dwl` url, it brings to redirect page, waiting 5 secs and after shows the dialog, and when clicking OK it works. Edited. Anyway, strange differences what is default selected. Im on macos, using Firefox 75. – kobame Apr 13 '20 at 15:57
  • 1
    See also [Unable to suppress a firefox pop-up for a file download](https://stackoverflow.com/q/42023320/2173773). Maybe you could use the [`mime_types`](https://metacpan.org/pod/Firefox::Marionette#new) argument to `new()` ? – Håkon Hægland Apr 13 '20 at 16:00
  • @HåkonHægland thanks for the suggestion, but doesn't helped. Also checked the module sources - the `zip` mime-type is set by [default](https://metacpan.org/source/DDICK/Firefox-Marionette-0.96/lib/Firefox/Marionette.pm#L367) . – kobame Apr 13 '20 at 18:33
  • 1
    @HåkonHægland finally solved by your suggestion. The download `mime_type` aren't `zip` (even it is a zip file) but `application/x-amz-json-1.0`. Adding this to the `mime_types` argument the file is downloaded without prompt. Thank you for the direction. – kobame Apr 13 '20 at 23:52
  • @HåkonHægland Don't want add your comment as an answer, so i could accept it? It helped me to start looking to the right direction. – kobame Apr 14 '20 at 14:35
  • Sure, done! .... – Håkon Hægland Apr 14 '20 at 20:26

1 Answers1

2

You can bypass the download popup window by setting the mime_types for the file (see this answer for more information). Using the MIME type you provided application/x-amz-json-1.0, the following works for me on Ubuntu 19.10:

use feature qw(say);
use strict;
use warnings;
use Path::Tiny;
use Firefox::Marionette ();
use Firefox::Marionette::Capabilities;

my $ff = Firefox::Marionette->new(
    mime_types             => ['application/x-amz-json-1.0'],
    visible                => 0,
    capabilities           => Firefox::Marionette::Capabilities->new(
        page_load_strategy => 'none'
    )
);
my $dwl = 'https://www.curseforge.com/wow/addons/dazaralor-totems/download/2610166/file';
$ff->go($dwl);
while(!$ff->downloads()) { say "No downloads yet.."; sleep 1 }
while($ff->downloading()) { say "Downloading.."; sleep 1 }
foreach my $p ($ff->downloads()) {
    path($p)->copy('./toto.zip');
}
$ff->quit;
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174