Using Node and Puppeteer to scrape a website.
I'm wanting to return the innerText of a class but there is another span element nested which is returning both.
<div class="inner_sm">
<h1 class="pdp_address ">79 Etwell Street
<span>East Victoria Park, WA 6101</span>
</h1>
<span class="pdp_price">$670,000
<span class="price_feature">Under Offer</span>
</span>
</div>
Target is .pdp_price wanting the result to be '$670,000' but getting '$670,000Under Offer'
const data = await page.evaluate(() => {
const address = document.querySelector('#app > div > div > article > div.pdp_header > div > div > h1').innerText.replaceAll('\n',',')
const bed = document.querySelector('.bed')?.innerText || ""
const bath = document.querySelector('.bath')?.innerText || ""
const car = document.querySelector('.car')?.innerText || ""
const price = document.querySelector('.pdp_price').innerText
return `${address}, ${price}, ${bed}, ${bath}, ${car} \n`
})
I've tried a few things but haven't been able to make it work.
const price = document.querySelector('.pdp_price:not(.price_feature)').innerText
const price = document.querySelector('.pdp_price')?.innerText.replaceAll(',','*') || ""
const cleanPrice = price.remove('.price_feature').innerText