0

I am attempting to perform a web scraping operation and would like to get all the children element in a html tree similar to this:

<div class="main">
    <p>Some p</p>
    <a>Some a</a>
    <br>
    <br>
    <em>
    <p>Another p</p>
    <a>Another a</a>
    <br>
    <br>
    <em>
    //...
</div>

I scraped the html using Puppeteer like so and managed to get the children but as a string format. Here are my attempts:

const children = await page.evaluate(el => el.children, await page.$('div.main'))
console.log(children)
//prints {"1": {}, "2": {}, "3": {} ...}

I then refer to this post and this post, and attempted this:

 const children = await page.evaluate(() => {
    var children = [...document.querySelector('div.main').children];
    return children.map((e) => e.outerHTML);
 })
 console.log(children)
//prints all children correctly, but all as strings

Is there a way to get all child elements under a tag but with all the DOM attributes retained so that I can loop over each element, perform some algorithmic operation and extract some attributes.

Koh
  • 2,687
  • 1
  • 22
  • 62

0 Answers0