How to scrape ... INSIDE another ... with puppeteer

Question

Alright, so the page I'm trying to scrape with node.js puppeteer is structured like this

    <html lang = "en">
    ....
       <html xmlns="https://www.w3.org/1999/xhtml" lang="en">
            <a href = "link I'm trying to go to">Go to link</a>
       </html>
    </html>

I tried to click by selector and XPath. Neither worked, and I triple checked that both were right. I feel like it has something to do with this embedded html, and I don't know how to handle it? Can anyone help?

As @MarcosCasagrande was hinting, if the content you wish to scrape is inside of an iframe, you'll need to scrape the URL of the iframe, as the DOM elements of the iframe content are not accessible from the parent document. — Michael Rodriguez, Dec 08 '19 at 18:22
@MarcosCasagrande Yes it is inside an iframe. Lemme try scraping the url of the iframe — John Smith, Dec 08 '19 at 18:24
@MichaelRodriguez Yep, it's inside an iframe. I'm gonna try scraping that url — John Smith, Dec 08 '19 at 18:24
@ssBarBee const link = page.$x(xpath); link[0].click() and page.click(selector);. Other comments have pointed out that I should try scraping the url of the iframe, so I'm gonna try that right now — John Smith, Dec 08 '19 at 18:25
https://stackoverflow.com/a/56420104/1861016 Refer to this @JohnSmith might be useful. In future add the code to your question :) — ssbarbee, Dec 08 '19 at 18:29

score 0 · Answer 1 · answered Dec 08 '19 at 18:44

Other comments pointed out that content inside an iframe are not accessible from the parent document. I checked the code again, and turns out it was actually structured like this:

<html lang = "en">
....
   <iframe src = "url">
       <html xmlns="https://www.w3.org/1999/xhtml" lang="en">
           <a href = "link I'm trying to go to">Go to link</a>
       </html>
   </iframe>
</html>

So all I had to do was page.goto(url), and then I could scrape as normal. Thanks everyone!

How to scrape ... INSIDE another ... with puppeteer

1 Answers1