-1

I want to understand how async / await works with cheerio (in my example).

  • As far as i can guess: I have to wait for the request to be done. So I should use await Promise.all() in front of the request?
  • Do I have to wait for this function: cheerio.load(html);?
  • What about $('article').each?
  • How then does everything come together in the main class so that after all urls have been processed, the processMailer method is called?

It would be super nice if you can explain the why a little bit! :)

In my Project

The goal:

I want to scrape deals from different urls. In the end everything should be wrapped up and send via mail.

The problem:

The mail is sent before the scraper scraped all urls.

The Code:

class Main {
    const startScraper = (user: User) => {
        user.dealGroups.forEach((dealGroup, index) => {
            const url = baseUrl + dealGroup;
        
            // Get deals from url
            scrapeDealsFromUrl(url, (scrapedDeals) => {
      
                if (scrapedDeals.length == 0) {
                console.log(`No deals scraped - ${url}`);
                return;
                }
        
        
                await deleteFilesOnWeekday(); // Reset files on sunday
        
                const oldDeals = readFromFile(dealGroup);
                const newDeals = checkForDuplicates(scrapedDeals, oldDeals);
                saveToFile([...newDeals, ...oldDeals], dealGroup);
      
          });
        });
    }
        
    processMailer(newDeals, dealGroup);
}
class Scraper {
    type DealCallback = (err: Deal[], deals?: Deal[]) => void;

    export function scrapeDealsFromUrl(url: string, callback: DealCallback) {
        // get data from mydealz and get the articles
        request(url, (error, response, html) => {
            if (!error && response.statusCode == 200) {
                const $ = cheerio.load(html);
                const deals: Deal[] = [];

                $('article').each(function (i, element) {

                    deals[i] = new DealBuilder()
                        .withTitle($(this).find('.thread-link').text())
                        .withPrice(extractNumberFromString($(this).find('.thread-price').text()))
                        .withPriceWas(extractNumberFromString($(this).find('span').find('.mute--text, .text--lineThrough').text()))
                        .withImageUrl($(this).find('img').attr('src'))
                        .withLink($(this).find('.boxAlign-jc--all-c, .btn--mode-primary').attr('href'))
                        .withScrapedDate(new Date())
                        .build();
                });
                return callback(deals)
            }
            return callback([]);
        });
    }
}

I already looked here, but I do not understand the answer.

Thanks for any hints or help!

  • You're using the [outdated](https://www.npmjs.com/package/request) callback-based `request` library. Instead, use a modern promise-based library like axios or `fetch`. Once you do, there are hundreds of examples of how to use these alongside Cheerio, like [this](https://stackoverflow.com/a/75400938/6243352). Cheerio is fully synchronous--it's just the request that's async. – ggorlen Feb 13 '23 at 19:18
  • 1
    Thangs for the advice. Now it works perfectly! – Mavelouscello Feb 18 '23 at 09:41

1 Answers1

0

You need some kind of promise to get the request data, parse it, and send to the callback. It might look like this:

export function scrapeDealsFromUrl(url: string, callback: DealCallback) {
  new Promise(resolve => {
    request(url, (error, response, html) => {
      // deals code here
      let deals = []
      resolve(deals)
    })
  }).then(callback)
}
pguardiario
  • 53,827
  • 19
  • 119
  • 159