When working with debugging Puppeteer's serialized code that's injected into the driven browser, keep in mind:
- the code is run in a browser context
- state isn't shared with the Node environment except through serialized parameters and return values (you can't access Node functions or ElementHandles in the browser)
- page
console.log
s aren't visible in the Node environment by default
With these things in mind, you can attach console.log
handlers to help debug your browser code from within Node. In many cases you can simply execute it outside of Puppeteer by hand in a console on the scraped page to validate that it works, as shown below.
The issue here is that there is a difference between querySelector
(return the first element matching a CSS selector) and querySelectorAll
(return a NodeList of all elements matching a selector). .href
is not a property on the NodeList object; this property access needs to be applied to each element in the NodeList using map
, which is available after converting the NodeList to an array.
Once you've given yourself the ability to console.log
, it's easy to debug this by simply printing the return values of every function call to see which properties exist.
The code below illustrates the difference between these two functions and works just fine in Puppeteer as it does in a stack snippet:
// print first element's href
console.log(document.querySelector("table tr td a").href);
// print all elements' hrefs
console.log([...document.querySelectorAll("table tr td a")].map(e => e.href));
<table>
<tr>
<td>
<a href="foo.html">foo</a>
<a href="bar.html">bar</a>
<a href="baz.html">baz</a>
</td>
</tr>
</table>