1

I want to download some file for example sitemap.xml.gz.
I want to do it only with playwright 1.22. I tried to do it with chromium browser, but it fails.
Also it doesn't work with webkit. With webkit it opens all file content on the page and gives me timeout.
It only works with firefox.
But I want to know that is wrong with others browsers? Maybe it is some bug in playwright.
Has anyone been able to download directly a file with playwright?

public class PwDownload {
    public static void main(String[] args) {
        try (Playwright playwright = Playwright.create()) {
            final BrowserType chromium = playwright.chromium();
            final Browser browser = chromium.launch(new BrowserType.LaunchOptions().setHeadless(false));
            Page page = browser.newPage();
            Download download = page.waitForDownload(() -> {
                page.navigate("https://www.fnac.es/sitemap-top-post.xml.gz");
            });
            System.out.println(download.path());
            browser.close();
        }
    }
}

Error trace with chromium:

navigating to "https://www.fnac.es/sitemap-top-post.xml.gz", waiting until "load"
============================================================
    at FrameSession._navigate (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/chromium/crPage.js:636:35)
    at runNextTicks (node:internal/process/task_queues:61:5)
    at processImmediate (node:internal/timers:437:9)
    at async /private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/frames.js:648:30
    at async ProgressController.run (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/progress.js:101:22)
    at async FrameDispatcher.goto (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/dispatchers/frameDispatcher.js:86:59)
    at async DispatcherConnection.dispatch (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/dispatchers/dispatcher.js:352:22)
}
    at com.microsoft.playwright.impl.Connection.dispatch(Connection.java:183)
    at com.microsoft.playwright.impl.Connection.processOneMessage(Connection.java:163)
    at com.microsoft.playwright.impl.ChannelOwner.runUntil(ChannelOwner.java:101)
    ... 19 more
MeT
  • 675
  • 3
  • 6
  • 21
  • 1
    link that you posted - shows access denied (from any browser, doesn't matter - local or automated): https://i.imgur.com/JWLL31x.png . Could you please provide another link to file? – Andrey Kotov Jun 27 '22 at 21:19

2 Answers2

1

Works for Chromium and Firefox. Change outputDirectory variable before running.

import com.microsoft.playwright.*;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.FilenameUtils;

import java.io.File;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Main {
    public static void main(String[] args) throws Exception {

        try (Playwright playwright = Playwright.create()) {
            String outputDirectory = "d:\\";

            String url = "https://www.johnlewis.com/sitemap/products/products-00.xml.gz";
            String filename = FilenameUtils.getName(url);

            BrowserType browserType = playwright.firefox();
            Browser browser = browserType.launch(new BrowserType.LaunchOptions().setHeadless(false));
            BrowserContext newContext = browser.newContext(new Browser.NewContextOptions().setAcceptDownloads(true));
            Page page = newContext.newPage();

            Download download = page.waitForDownload(() -> {
                page.evaluate("(y) => {location.href = y;}", url);
            });

            Path downloadedFilePath = download.path();
            System.out.println("Downloaded to " + downloadedFilePath);

            Path destinationFilePath = Paths.get(outputDirectory, filename);
            FileUtils.copyFile(new File(downloadedFilePath.toString()), new File(destinationFilePath.toString()));
            System.out.println("Saved to " + destinationFilePath);
        }
    }
}

As for webkit, I guess there is some kind of built in browser functionality you cannot override. You can even try to open webkit using Playwright's java code and then insert a link manually and try to download - it doesn't allow you to do this (even in separate window, or even using javascript + download html attribute)

Andrey Kotov
  • 1,344
  • 2
  • 14
  • 26
  • It works on **chromium**, on firefox it worked with the code in the question. Thanks. – MeT Jun 28 '22 at 16:37
  • Only one thing **acceptDownloads = true** it is a default value you don't need to set it. – MeT Jun 28 '22 at 16:44
  • Hm, strange, it didn't work without setAcceptDownloads(true) in chromium. And as I see from docs - false is default value: https://i.imgur.com/vjanbwz.png. If it works on your computer without this property set to `true` then it is cool. – Andrey Kotov Jun 28 '22 at 16:58
  • I have true as default value. https://ibb.co/YXQTVrh Maybe you have other **PW version**. I have PW v1.22. – MeT Jun 28 '22 at 17:06
  • 1
    Haha, that's funny. I am using `compile 'com.microsoft.playwright:playwright:1.17.0'`. Don't know what to say, just glad it works. – Andrey Kotov Jun 28 '22 at 17:09
0

If you already have a link to your file, then why you don't just download it via a normal connection? Download file from HTTPS server using Java

Playwright Downloads are like downloads triggered from a page or by a user. Just visiting a link isn't going to trigger a download event.

Max Schmitt
  • 2,529
  • 1
  • 12
  • 26