Downloading pdf files using playwright-python

Question

I'm trying to download PDF files that are rendered in a browser (not shown as a popup or downloaded) using playwright (Python). No URL is exposed, so you can't simply scrape a link and download it using requests.get("file_url").

I've tried:

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        page = await browser.newPage(acceptDownloads=True)
    
        await page.goto("www.some_landing_page.com")
            
        async with page.expect_download() as download_info:
            await page.click("a")     # selector to a pdf file
        
        download = download_info.value
        path = download.path()

I've also tried page.expect_popup() with no luck either. My understanding is that this can't be done using pyppeteer, but would welcome a solution this way as well, if possible.

Do you have an example of a page which renders a pdf in the browser that you would like to retrieve? — buddemat, Dec 21 '20 at 23:24
Wow, this site doesn't want to be scraped. They included a `debugger` statement which immediately pauses execution of all JS upon opening the dev tools — Mattwmaster58, Jan 13 '21 at 18:18
Can you also add a `download.save_as(path="file.pdf")` to the below and see if file is downloaded? I did a similar thing and works on me. If it doesn't work, maybe give the landing page URL we can help more — Cagatay Barin, Oct 01 '21 at 22:36
@ÇağatayBarın I tried that as well. Spent some more time and it appears there is a bug in Playwright. It works fine using firefox or webkit (in headless and headful modes), but with chromium it throws the error "subtree intercepts pointer events". — FarNorth, Oct 04 '21 at 18:54

score 3 · Accepted Answer · answered Oct 04 '21 at 18:57

3

For anyone with a similar problem, try using firefox or webkit instead of chromium for the browser. Provided a work-around for me.

answered Oct 04 '21 at 18:57

FarNorth

289
1
7
16

Downloading pdf files using playwright-python

1 Answers1