0

I am trying to scape data from this site and it has this structure

<div>
   <b>insert bold here</b>
   important text
   <a href="#">link here</a>
</div>

and I need to access the "important text" which chrome dev tools shows #text.

I've tried to remove the and but I always end up getting a result of undefined when I did the .text() method.

I've tried looping over the children, contents, etc.

  • Common problem: [1](https://stackoverflow.com/questions/20832910/get-text-in-parent-without-children-using-cheerio/74579448#74579448), [2](https://stackoverflow.com/questions/73690939/how-to-get-a-text-thats-separated-by-different-html-tags-in-cheerio/73692854#73692854), [3](https://stackoverflow.com/questions/54878673/cheerio-get-normal-text-nodes/73693773#73693773), [4](https://stackoverflow.com/questions/74418220/how-do-i-get-text-after-single-br-tag-in-cheerio/74418510#74418510). Are you sure there are no additional classes, attributes or ids you can use to select this subtree? – ggorlen Nov 27 '22 at 05:10
  • Does this answer your question? [Get text in parent without children using cheerio](https://stackoverflow.com/questions/20832910/get-text-in-parent-without-children-using-cheerio) – ggorlen Nov 27 '22 at 05:10
  • `[...$("div").contents()].find(e => e.type === "text" && e.nodeValue.trim()).nodeValue.trim()` should work but it's a pretty brittle/vague top-level selector. – ggorlen Nov 27 '22 at 05:14
  • "I've tried looping over the children, contents, etc." good idea to show that attempt, because that's the right idea. – ggorlen Nov 27 '22 at 05:25

1 Answers1

0

here is the code:

const $ = cheerio.load(html);
  
const result = [...$("div").contents()]
  .filter(e => e.type === "text" && $(e).text().trim())
  .map(e => $(e).text().trim());
console.log(result[0]);

See how it's working against your test input in cheerio sandbox: https://scrapeninja.net/cheerio-sandbox?slug=2a11aaa1eb1198fb1fbe55a51b0dd4bc67dcb3db

Anthony S
  • 124
  • 2
  • Pretty much a direct lift without attribution of [this answer](https://stackoverflow.com/a/74579448/6243352). Usually, we close the question as a dupe if it's been asked and answered many times before, as is the case here. It's hard to maintain answers when they're copied and pasted into a bunch of different questions. – ggorlen Jan 04 '23 at 17:27