5

I'm trying to download PDF files that are rendered in a browser (not shown as a popup or downloaded) using playwright (Python). No URL is exposed, so you can't simply scrape a link and download it using requests.get("file_url").

I've tried:

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        page = await browser.newPage(acceptDownloads=True)
    
        await page.goto("www.some_landing_page.com")
            
        async with page.expect_download() as download_info:
            await page.click("a")     # selector to a pdf file
        
        download = download_info.value
        path = download.path()

I've also tried page.expect_popup() with no luck either. My understanding is that this can't be done using pyppeteer, but would welcome a solution this way as well, if possible.

Jacob Lee
  • 4,405
  • 2
  • 16
  • 37
FarNorth
  • 289
  • 1
  • 7
  • 16
  • 3
    Do you have an example of a page which renders a pdf in the browser that you would like to retrieve? – buddemat Dec 21 '20 at 23:24
  • 2
    Wow, this site doesn't want to be scraped. They included a `debugger` statement which immediately pauses execution of all JS upon opening the dev tools – Mattwmaster58 Jan 13 '21 at 18:18
  • Can you also add a `download.save_as(path="file.pdf")` to the below and see if file is downloaded? I did a similar thing and works on me. If it doesn't work, maybe give the landing page URL we can help more – Cagatay Barin Oct 01 '21 at 22:36
  • @ÇağatayBarın I tried that as well. Spent some more time and it appears there is a bug in Playwright. It works fine using firefox or webkit (in headless and headful modes), but with chromium it throws the error "subtree intercepts pointer events". – FarNorth Oct 04 '21 at 18:54

1 Answers1

3

For anyone with a similar problem, try using firefox or webkit instead of chromium for the browser. Provided a work-around for me.

FarNorth
  • 289
  • 1
  • 7
  • 16