I want to understand how async / await works with cheerio (in my example).
- As far as i can guess: I have to wait for the request to be done. So I should use
await Promise.all()
in front of the request? - Do I have to wait for this function:
cheerio.load(html);
? - What about
$('article').each
? - How then does everything come together in the main class so that after all urls have been processed, the
processMailer
method is called?
It would be super nice if you can explain the why a little bit! :)
In my Project
The goal:
I want to scrape deals from different urls. In the end everything should be wrapped up and send via mail.
The problem:
The mail is sent before the scraper scraped all urls.
The Code:
class Main {
const startScraper = (user: User) => {
user.dealGroups.forEach((dealGroup, index) => {
const url = baseUrl + dealGroup;
// Get deals from url
scrapeDealsFromUrl(url, (scrapedDeals) => {
if (scrapedDeals.length == 0) {
console.log(`No deals scraped - ${url}`);
return;
}
await deleteFilesOnWeekday(); // Reset files on sunday
const oldDeals = readFromFile(dealGroup);
const newDeals = checkForDuplicates(scrapedDeals, oldDeals);
saveToFile([...newDeals, ...oldDeals], dealGroup);
});
});
}
processMailer(newDeals, dealGroup);
}
class Scraper {
type DealCallback = (err: Deal[], deals?: Deal[]) => void;
export function scrapeDealsFromUrl(url: string, callback: DealCallback) {
// get data from mydealz and get the articles
request(url, (error, response, html) => {
if (!error && response.statusCode == 200) {
const $ = cheerio.load(html);
const deals: Deal[] = [];
$('article').each(function (i, element) {
deals[i] = new DealBuilder()
.withTitle($(this).find('.thread-link').text())
.withPrice(extractNumberFromString($(this).find('.thread-price').text()))
.withPriceWas(extractNumberFromString($(this).find('span').find('.mute--text, .text--lineThrough').text()))
.withImageUrl($(this).find('img').attr('src'))
.withLink($(this).find('.boxAlign-jc--all-c, .btn--mode-primary').attr('href'))
.withScrapedDate(new Date())
.build();
});
return callback(deals)
}
return callback([]);
});
}
}
I already looked here, but I do not understand the answer.
Thanks for any hints or help!