0

First of all, please understand that grammar may not be correct by using a translator.

I'm going to use cheerio to do web scraping in React environment.

Part of the site(for example) :

<ul>
  <li>
    <div class="name">burger</div>
    <div class="price">5,500</div>
  </li>
  <li>
    <div class="name">sandwich</div>
    <div class="price">3,500</div>
  </li>
  <li>
    <div class="name">ramyeon</div>
    <div class="price">1,500</div>
  </li>
</ul>

my code(FYI, this code works well when scraping other sites) :

const cheerio = require("cheerio");
let prodData = [];

useEffect(() => {
    scraping();
}, []);

const scraping = () => { 

     axios.get("/product/thisIsExample")
        .then(res => {
            if (res.status === 200) {
                const html = res.data;
                const $ = cheerio.load(html);
                const children = [...$("ul").children("li")];
                children.forEach(v => {
                    prodData.push({
                        prodName: $(v).find("div.name").text(),
                        prodPrice: $(v).find("div.price").text()
                    });

                });
                
                if(prodData.length !== 0) {
                    console.log(prodData);
                }
            }
        }, (err) => console.log("error"));

}

The problem is that the part(<li>) I'm trying to scrape is dynamic, so generated only after the data call is finished.

I mean, I'm trying to scrape the <li> from <ul> into an array, but when I scrape it, there are no <li> inside <ul>.

What should I do to scrape <li>?

sloth
  • 3
  • 1
  • I't possible that the page fills these items dynamically by script. I suggest you scrape the page when loading is finished. $(window).on('load', function() { // define const scraping=() => {... here }); If that fails you could add a crude timer to start scraping after a while. – Hacky Jan 04 '21 at 13:23
  • Maybe cheerio don't can treat this case. You can try make this function async, await some time and make the request again, or to use selenium webdrive based scraper. – Daniel Farina Jan 04 '21 at 13:33
  • You want to use puppeteer or jsdom and wait for the xhrs to finish. – pguardiario Jan 05 '21 at 03:13
  • I'm sorry for the late comment. It's a little slow, but I solved it by using puppeteer. Thank you! – sloth Aug 11 '22 at 05:08

1 Answers1

0

Dynamic pages create visible content well after loading - reading html from server would get you some basic structure and script tags, but not content which would be generated by scripts in this page and ultimately visible by user.

To scrape dynamic web pages you will need something like selenium and automate real browser.

Konstantin Pribluda
  • 12,329
  • 1
  • 30
  • 35