0

Using cheerio, how can i grab 2 separate html contents which follow an html element, and not are inside a specific html element? what i want to grab is from:

<div>
   <time>
   <svg>...<svg/>
   "first string I want to grab"
    <svg>...<svg/>
   "second string I want to grab"
   </time>
</div>
 $(item).find('div').find('time').find('svg:nth-of-type(2)').text();
   const result = [...$(item).find('header').find('div').find('span:nth-of-type(1)').find('time').childNodes]
                    .filter(e =>
                        e.nodeType === Node.TEXT_NODE && e.textContent.trim()
                    )
                    .map(e => e.textContent.trim());
learncode
  • 127
  • 6
  • `$(item).find('div').find('time').find('svg:nth-of-type(2)')` can just be `$(item).find('div time svg:nth-of-type(2)')`. There's no need to repeat the `.find()` calls when CSS already has descendant capabilities. As for your normal question, there are many dupes but [this](https://stackoverflow.com/a/73692854/6243352) should suffice. Let me know if it doesn't work for you (and [edit] the post to provide further details, preferably the actual website/html). Thanks. – ggorlen Aug 12 '23 at 09:51
  • Thank you. How can I do it with cheerio? I edited my question – learncode Aug 12 '23 at 10:26
  • The bottom Cheerio code you shared, taken from the other post, works fine for me if I copy-paste your HTML in, fix the bad end tags (`` not ``) and use the selector `result = [...$("div time").contents()]` (plus the rest of the code). You're showing a `.find('span:nth-of-type(1)')` but that doesn't exist in your example, so it's not a [mcve]. – ggorlen Aug 12 '23 at 15:04

2 Answers2

1

You have to use the parse5 methods for "text nodes":

$('svg').get().map(svg => $(svg.nextSibling).text())
pguardiario
  • 53,827
  • 19
  • 119
  • 159
0

Your example isn't reproducible, but if you fix your selectors and/or use correct closing tags, </svg> rather than <svg/>, this answer should work out of the box:

const cheerio = require("cheerio"); // 1.0.0-rc.12

const html = `<div>
  <time>
    <svg>...</svg>
    "first string I want to grab"
    <svg>...</svg>
    "second string I want to grab"
  </time>
</div>`;

const $ = cheerio.load(html);
const result = [...$("div time").contents()]
  .filter(e => e.type === "text" && $(e).text().trim())
  .map(e => $(e).text().trim());
console.log(result);

Output:

[ '"first string I want to grab"', '"second string I want to grab"' ]

As mentioned in the comments, CSS already handles descendants, so you can use

.find("header div span:nth-of-type(1) time")

rather than

.find('header').find('div').find('span:nth-of-type(1)').find('time')

If this doesn't work, please share the actual site or full HTML structure you're working with. In addition to the </svg> typo, there is no <span> in your snippet.

It's surprising there are no class names here. Usually, classes, attributes and ids are more reliable than nth tag selectors. Instead of retyping an incorrect excerpt, it's better to provide the actual HTML, copy-pasted to preserve syntax and attributes.

Note that Cheerio only works on static HTML. If the site uses JavaScript to create these elements, that might explain why you can't find them if you're pulling down the page with fetch or axios. Ensure the elements are visible in the view-source: version of the site--the dev tools element inspector might be misleading. If they're not in the static HTML, consider using Playwright rather than fetch/cheerio to scrape them.

Additional "get text node in Cheerio" threads:

ggorlen
  • 44,755
  • 7
  • 76
  • 106