0

I have a while loop that is going through a list until there is no more Load more button.

inside the while loop I have a for that increases the row count. At the bottom of it I'm putting the name and the other descriptions that i want into an object.

I want the script to skip to the next name on the list if the name has already been scraped (I don't want it to even save the name).

I can't do a continue because it throws an error that exerciseName has not been declared yet. I've tried putting the declare at top of the page but then the variables inside of it haven't been declared yet. How can I have it go through the loop and skip the rest of the process if it has already scraped that name?

My code:

        for (let i = 2; i < rowsCounts + 1; i++) {

// this is getting the exercise name
            const exerciseName = await page.$eval(
                `.ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a`,
                (el) => el.innerText
            );
   
// i've tried to do the continue here but it throws an error as the object hasn't been declared yet

// REST OF THE FANCY CODE HERE



            let obj = {
                exercise: exerciseName,
                exerciseDescription: exerciseDescription,
                AlternativeExercise: AlternativeExercise,
            };

// I tried doing the continue here so it wouldn't push anything to the list but the problem is 
// the script is opening bunch of tabs and it's way too much traffic plus it 
// slows down things a lot. So it needs to be at the top so it can skip all those steps.

            if (exerciseName !== obj.exercise) {
                continue;
            }


            allData.push(obj);

        }

update this is major part of my code:

 const LoadMoreButton =
        '#js-ex-content > #js-ex-category-body > .ExCategory-results > .ExLoadMore > .bb-flat-btn';

    var buttonExists = true;
    let allData = [];
    while (buttonExists == true) {
        const loadMore = true;
        const rowsCounts = await page.$$eval(
            '.ExCategory-results > .ExResult-row',
            (rows) => rows.length
        );
        // console.log(`row counts = ${rowsCounts}`);

        for (let i = 2; i < rowsCounts + 1; i++) {

            const exerciseName = await page.$eval(
                `.ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a`,
                (el) => el.innerText
            );
            console.log(` ${i}  = ${exerciseName}`);
   

            if (exerciseName !== obj.exercise) {
                let obj = {
                    exercise: 'remove',
                    exerciseDescription: '',
                    AlternativeExercise: '',
                };
                continue;
            }


            let ExerciseLink = await page.$eval(
                `.ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a`,
                (el) => el.getAttribute('href')
            );
            // console.log(`href = ${ExerciseLink}`);


            const pageTab = await browser.newPage();           // open new tab
            await pageTab.goto('https://www.bodybuilding.com' + ExerciseLink);
            await pageTab.waitForSelector('#js-ex-content');

            const exerciseDescription = await pageTab.$eval(
                '#js-ex-content > .ExDetail > .ExDetail-section > .flexo-container > .grid-8',
                (el) => el.innerHTML
            );
            // console.log(`${exerciseDescription}`)

            // this returns the title to alternative exercises
            const AlternativeExercise = await pageTab.evaluate(() => {
                var links = document.querySelectorAll('h3.ExResult-resultsHeading a');
                return Array.from(links).map((links) => { return links.innerHTML });
            });

            // console.log(`alternative workouts are: = ${AlternativeExercise}`);

            // await page.goBack();
            await pageTab.close();

            let obj = {
                exercise: exerciseName,
                exerciseDescription: exerciseDescription,
                AlternativeExercise: AlternativeExercise,
            };
            

            // allData.push(obj);
            allData.filter(d => d.exercise !== 'remove');


        }
        // clicking load more button and waiting 1sec
        try {
            await page.click(LoadMoreButton);
        }
        catch (err) {
            buttonExists = false;
        }

        await page.waitForTimeout(1000);


        // await page.waitForNavigation({
        //     waitUntil: 'networkidle0',
        // });


    }

    console.log(allData);
    async function fn() {
        // json export error part
        jsonexport(allData, function (err, csv) {
            if (err) return console.error(err);
            console.log(csv);
            fs.writeFileSync('DetailExercise.csv', csv);
        });
    }
    fn();
    await browser.close();
vsemozhebuty
  • 12,992
  • 1
  • 26
  • 26
Bruce Mathers
  • 661
  • 1
  • 8
  • 24
  • 1
    One possible _hack_ would be to create an empty object and push that just prior to continuing. Then, after exiting the while loop, simply filter out the empty or flagged objects. This is one way to solve the performance problem while achieving your goal. If you choose to do this, do this as early as possible (your original instinct). Alternatively, you could solve the performance issue by pulling down the entire structure and then parse it all at once locally rather than doing HTTP requests in a loop (this is the better option IMO). – Randy Casburn Feb 01 '21 at 16:48
  • thank you for the quick response. I'm confused about how to create an empty object and just push into it. I'm looking at this answer here but I can't put it togheter. https://stackoverflow.com/questions/251402/create-an-empty-object-in-javascript-with-or-new-object/251743 sorry am new to coding an javascript. – Bruce Mathers Feb 01 '21 at 17:04
  • 1
    Something like this [gist](https://gist.github.com/randycasburn/4650b9bf850870dd08ea773627112b24#file-empty-obj-to-push-then-continue) should work. – Randy Casburn Feb 01 '21 at 17:07
  • So I would remove the `let obj` and the `allData.push(obj);` that I have declared at the bottom of the `for` loop? And then after the entire while loop I would push all the data into the object except the exercises that are not equal to `remove` – Bruce Mathers Feb 01 '21 at 17:46
  • I updated my question with the entire code so you can see what I have. Again thank you for all the help – Bruce Mathers Feb 01 '21 at 17:47
  • 1
    This statement: `allData.filter(d => d.exercise !== 'remove');` should go outside the `while()` loop, not inside. – Randy Casburn Feb 01 '21 at 18:46
  • if I comment the let obj out (the bottom one that after the await pageTab.close();` it will give me an error of `ReferenceError: Cannot access 'obj' before initialization` ​ if I leave it I will get an error of `ReferenceError: obj is not defined` I'm confused on how to go by it. Very sorry about the confusion – Bruce Mathers Feb 01 '21 at 20:31

0 Answers0