0

I have been trying to scrape some data from Wikipedia using nodejs. i used request-promise and cheerio for the same. The first then block in the requestPromise works as expected and returns once the whole code logic above has been completed. But for the second then block i used two methods - M-1 and M-2, but both the code despite of being blocking returns the data and thus the third then block executes and console logs undefined, i dont undertsand why it returns before completing the promise

let cheerio        = require('cheerio')
let requestPromise = require('request-promise')

//Website to be scraped
const url = "https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States"

requestPromise(url)
    .then( html  => {
        let wikiLinks = []

        let obj = cheerio('big > a', html)
        for (let key in obj){
            if(obj[key].attribs){
                wikiLinks.push(obj[key].attribs.href)
            }
        }
        return wikiLinks
    })
    .then(  links => {
        //M-1
        let data = []
        let info
        links.forEach(async link => {
            info = await getAllBirthdayData(link)
            data.push(info)
        })
        return data ==> returns []


        //M-2
        return Promise.all([
            links.forEach(link => {
                return getAllBirthdayData(link)
            })
        ])

       //M-3
       return await Promise.all([
        links.map( async link => {
            return await getAllBirthdayData(link) ==> returns pending 
            promises only
        })
       ])
    })
    .then(finalData => {
        console.log(finalData)
    })
    .catch(err => {
        console.log("error 1")
    })


let getAllBirthdayData = (url) => {
     return requestPromise("https://en.wikipedia.org/" + url)
        .then( html => {
            return {
                name : cheerio('.firstHeading', html).text(),
                birthday : cheerio('.bday', html).text()
            }
        })
        .catch( err => {
            console.log("error 2")
        })
}

I expect the output to be an array of objects with key-value pairs as such

[{name : something, birthday : 2018-01-01},
{name : something2, birthday : 2018-01-02}]
Aryan Arora
  • 143
  • 2
  • 12
  • @Quentin regarding the duplicate you mentioned, I already tried the map method as well as promise.all and they didn't resolve the issue. – Aryan Arora Jul 04 '19 at 21:18
  • Promise.all accepts an array of promises as its argument. You are passing an array containing the return value from forEach … which isn't a promise. – Quentin Jul 04 '19 at 21:25
  • @Quentin function getAllBirthdayData(URL) does return a promise, correct me if I am wrong!! – Aryan Arora Jul 04 '19 at 21:27
  • 1
    Use `for (link of links) {` instead of `links.forEach(async link => {`, then it will work, you don't need M2! @Quentin forEach does NOT have a return value =)...A return statement inside a `.forEach()` callback behaves like a continue but does not return anything outside the loop, that's where `.map()` would be suitable – exside Jul 04 '19 at 21:31
  • @exside It worked!!, but I tried using links.map and it didn't work. Can you please elaborate why both forEach and map did not work. I thought it was good practice to use both of them instead of for(link of links) or for(let i = 0; i < links.length; i++) – Aryan Arora Jul 04 '19 at 21:39
  • 1
    Because both map and forEach call the provided callback but do not wait for it to be executed! And no, IMHO using `.forEach()` or `.map()` are more for convenience and shorter (and likely more readable) code, not necessarily a best practice, they are significantly slower than a for loop! – exside Jul 04 '19 at 21:51
  • @exside Thanks alot for the help – Aryan Arora Jul 04 '19 at 22:07

0 Answers0