I am attempting to perform a web scraping operation and would like to get all the children element in a html
tree similar to this:
<div class="main">
<p>Some p</p>
<a>Some a</a>
<br>
<br>
<em>
<p>Another p</p>
<a>Another a</a>
<br>
<br>
<em>
//...
</div>
I scraped the html
using Puppeteer like so and managed to get the children
but as a string
format. Here are my attempts:
const children = await page.evaluate(el => el.children, await page.$('div.main'))
console.log(children)
//prints {"1": {}, "2": {}, "3": {} ...}
I then refer to this post and this post, and attempted this:
const children = await page.evaluate(() => {
var children = [...document.querySelector('div.main').children];
return children.map((e) => e.outerHTML);
})
console.log(children)
//prints all children correctly, but all as strings
Is there a way to get all child elements under a tag but with all the DOM attributes retained so that I can loop over each element, perform some algorithmic operation and extract some attributes.