2

I am currently scrapping a website with Node.js and Puppeteer and I got innerHTML. The problem is that I need to get second div in a same level, something like this:

<!-- Really basic version of my innerHTML -->
<div class='1'>... </div>
<div class='2'>
    <p>1</p>
</div>

I can't use classes, I can't get higher, so is there a way to get class 2 div from this kind of innerHTML? I am trying div:nth-child(2) with Cheerio, but it returns me class 1 div's second element.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • 1
    Try with selector `:nth-child(2)`. – vibhor1997a Apr 01 '18 at 07:11
  • regular expressions? – xianshenglu Apr 01 '18 at 07:34
  • Tried it, doesn't work. to get that inerHTML, I am doing now $('div:nth-child(2)), but if I try to use $('div:nth-child(2) > :nth-child(2)') it returns same value some how.. if i try using $('div:nth-child(2) > div:nth-child(2)') it returns class 1 second child.. – Karolis Malisauskas Apr 01 '18 at 07:34
  • *Note: I use cheerio npm, which is something like jQuery api – Karolis Malisauskas Apr 01 '18 at 07:38
  • I'm unable to reproduce. `document.querySelector("div:nth-child(2)")` works fine on this HTML. Can you show a [mcve]? Also, why not just use Puppeteer's selectors if you're already using Puppeteer? It doesn't make sense to pull a static HTML parser like Cheerio in when you already have a dynamic browser that's parsed and is working with the live DOM already. It's possible that your Cheerio string pulled from Puppeteer is stale and doesn't have the correct elements loaded yet. – ggorlen Aug 04 '21 at 21:20
  • @xianshenglu [Please no regex on HTML](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/). – ggorlen Aug 04 '21 at 21:23
  • Also worth noting: when you're showing an excerpt of the HTML, the parent element wrapper may well be the best way to access these `
    `s. But it's missing, along with most of the rest of the context for the page, so there's no good way to help here.
    – ggorlen Nov 26 '22 at 01:55

0 Answers0