How to download PDF files with Playwright? (Python)

Question

I'm trying to automate the download of a PDF file using Playwright, I've the code working with Selenium, but some features in Playwright got my attention. The real problem the documentation isn't helpful. When I click on download I get this:

And I cant change the directory of the download, it also delete the "file" when the browser/context are closed. Using Playwright I can achieve a nice download automation?

Code:

def run(playwright):
    browser = playwright.chromium.launch(headless=False)
    context = browser.new_context(accept_downloads=True)

    # Open new page
    page = context.new_page()

    # Go to http://xcal1.vodafone.co.uk/
    page.goto("http://xcal1.vodafone.co.uk/")

    # Click text=Extra Small File 5 MB A high quality 5 minute MP3 music file 30secs @ 2 Mbps 10s >> img
    with page.expect_download() as download_info:
        page.click("text=Extra Small File 5 MB A high quality 5 minute MP3 music file 30secs @ 2 Mbps 10s >> img")
    download = download_info.value
    path = download.path()
    download.save_as(path)
    print(path)

    # ---------------------
    context.close()
    browser.close()

with sync_playwright() as playwright:
    run(playwright)

score 9 · Accepted Answer · answered Aug 04 '21 at 20:22

9

The download.path() in playwright is just a random GUID (globally unique identifier). It's designed to validate the download works - not to keep the file.

Playwright is a testing tool and imagine running tests across every major browser on every code change - any downloads would quickly take up a lot of space and it would hack people off if you need to manually clear them out.

Good news is you are very close - If you want to keep the file you just need to give the file a name in the save_as.

instead of this:

   download.save_as(path)

use this:

   download.save_as(download.suggested_filename)

That saves the file in the same location as the script.

answered Aug 04 '21 at 20:22

RichEdwards

3,423
2
6
22

5

I disagree with the notion that "playwright is a testing tool". It is a browser automation tool ("playwright") as well as a testing tool ("@playwright/test" ). Thanks for the answer! – nicojs Jan 24 '22 at 12:28
Could you please elaborate on what is the correct syntax to save the file in other directory? Would it work if instead of `suggested_filename` a path for download to be saved was indicated? – punkuotukas Apr 20 '22 at 17:51

score 2 · Answer 2 · answered Jul 31 '22 at 11:13

2

You can save at any location with download.save_as(path)

This worked for me.

from pathlib import Path

...
download.save_as(Path.home().joinpath('Downloads', download.suggested_filename))

answered Jul 31 '22 at 11:13

Rahul

10,830
4
53
88

score 2 · Answer 3 · answered Oct 17 '22 at 11:39

2

Its good for me:

url = config.url  # your file url
response = await page_request.get(url, params={'id': file_id})  #your request
file = await response.body()  # Downloaded file before save
file_name = filename.pdf  #  filename to be saved
open(file_name, 'wb').write(file)
print(f'File {file_name} is saved')

answered Oct 17 '22 at 11:39

Alex-Ko

21
3

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 19 '22 at 16:25

score 0 · Answer 4 · answered Jan 22 '23 at 15:02

When I tried a similar code, I got the error:

playwright._impl._api_types.Error: net::ERR_ABORTED at https://www.africau.edu/images/default/sample.pdf
=========================== logs ===========================
navigating to "https://www.africau.edu/images/default/sample.pdf", waiting until "load"
============================================================

In retrospect, it's likely because of the fact that I have set my playwright.chromium.launch_persistent_context(user_dir) to "always_open_pdf_externally:true" as in this example: https://github.com/microsoft/playwright/issues/3509 In stead, what I needed to do was to use a try/except method like such:

    async with page.expect_download() as download_info:
        try:
            await page.goto("https://www.africau.edu/images/default/sample.pdf", timeout= 5000)
        except:
            print("Saving file to ", downloads_path, file_name)
            download = await download_info.value
            print(await download.path())
            await download.save_as(os.path.join(downloads_path, file_name))
        await page.wait_for_timeout(200)

Maybe this helps someone. It seems there isn't a clean method for this, yet: https://github.com/microsoft/playwright/issues/7822

How to download PDF files with Playwright? (Python)

4 Answers4