1

I'm trying to get some text using Cheerio that is placed after a single <br> tag.

I've already tried the following lines:

let price = $(this).nextUntil('.col.search_price.discounted.responsive_secondrow').find('br').text().trim();
let price = $(this).nextUntil('.col.search_price.discounted.responsive_secondrow.br').text().trim();

Here is the HTML I'm trying to scrape:

<div class="col search_price_discount_combined responsive_secondrow" data-price-final="5039">
  <div class="col search_discount responsive_secondrow">
    <span>-90%</span>
  </div>
  <div class="col search_price discounted responsive_secondrow">
    <span style="color: #888888;"><strike>ARS$ 503,99</strike></span><br>ARS$ 50,39     
  </div>
</div>

I would like to get "ARS$ 50,39".

ggorlen
  • 44,755
  • 7
  • 76
  • 106

2 Answers2

1

If you're comfortable assuming this text is the last child element, you can use .contents().last():

const cheerio = require("cheerio"); // 1.0.0-rc.12

const html = `
<div class="col search_price_discount_combined responsive_secondrow" data-price-final="5039">
  <div class="col search_discount responsive_secondrow">
    <span>-90%</span>
  </div>
  <div class="col search_price discounted responsive_secondrow">
    <span style="color: #888888;"><strike>ARS$ 503,99</strike></span><br>ARS$ 50,39     
  </div>
</div>
`;
const $ = cheerio.load(html);
const sel = ".col.search_price.discounted.responsive_secondrow";
const text = $(sel).contents().last().text().trim();
console.log(text); // => ARS$ 50,39

If you aren't comfortable with that assumption, you can search through the children to find the first non-empty text node:

// ...
const text = $([...$(sel).contents()]
  .find(e => e.type === "text" && $(e).text().trim()))
  .text()
  .trim();
console.log(text); // => ARS$ 50,39

If it's critical that the text node immediately follows a <br> tag specifically, you can try:

// ...
const contents = [...$(sel).contents()];
const text = $(contents.find((e, i) =>
    e.type === "text" && contents[i-1]?.tagName === "br"
  ))
  .text()
  .trim();
console.log(text); // => ARS$ 50,39

If you want all of the immediate text children, see:

ggorlen
  • 44,755
  • 7
  • 76
  • 106
0

You should be able to get the price by using:

$('.col.search_price.discounted.responsive_secondrow').html().trim().split('<br>')

This gets the inner HTML of the element, trims extra spaces, then splits on the <br> and takes the 2nd part.

See example at https://jsfiddle.net/b7nt0m24/3/ (note: uses jquery which has a similar API to cheerio)

userx
  • 26
  • 3
  • Using string manipulation defeats the purpose of an HTML parser. We should be able to use Cheerio to drill all the way down and isolate the text node. If the parent container's HTML happens to have another `
    ` added at some point, or the `
    ` turns into `
    `, or the `
    ` is removed completely, this breaks.
    – ggorlen Nov 13 '22 at 04:54