0

I've some code that uses Puppeteer that I am using to grab random words from a website that generates them. The process simulates a drop-down click, entry, and mouse click and then grabs text generated on the next page and saves it into a .json file in the cwd.

I am trying to figure out the best way to display some form of a progress descriptor of the happenings like a percentage (as I would like to have the Puppeteer code run in headless mode) but I cannot figure how I might get that to work under my current set up. Does anyone have any suggestions? Also, this is my first quarter in JS so I understand I may have done things in ways that are not polished, please be kind.

// instantiate puppeteer 
const puppeteer = require('puppeteer');



// function call for puppeteer
async function launchSearch(){
    // website that generates random words where the words will come from
    const url = 'https://www.sodacoffee.com/words/list-generator';
    // button div id to click and generate more word searches
    const buttonClick = '#ctl00_ContentPane_btn';
    // div id for the drop-down selector on the url asking how many results we want
    const numResultsPerClick = '#ctl00_ContentPane_resultscounter';
    // puppeteer browser launch options
    const browser = await puppeteer.launch({
        // headless == no graphical representation of the browser
        headless: false,
    });
    // create new browser element named page
    const page = await browser.newPage();
    // go to word generator URL
    await page.goto(url);
    // variable searches DOM for div id (declared above)
    const numWordsScrape = await page.$(numResultsPerClick);
    // select the element
    await numWordsScrape.click()
    // type 50 to return 50 random words
    await numWordsScrape.type('50');
    // to prevent some errors due to promises, Promise.all() seemed to be best to get results
    await Promise.all([
        // wait until the page has loaded (url)
        page.waitForNavigation(),
        // click button
        page.click(buttonClick)
    ])
    // site *should* have advanced forward to the next page with 50 results 
    const textAfterButtonClick = await page.evaluate(
        // create an array from the results of the query to the DOM, and map those specific elements <tr> -> to their inner text values
        () => Array.from(document.querySelectorAll('#ctl00_ContentPane_GridView1 tbody tr')
        ).map((elem) => elem.innerText.trim())
    );

    // instantiate file handling 
    const fs = require('fs');
    const file = 'word.txt';
    fs.writeFileSync('./words.json', JSON.stringify(textAfterButtonClick), err => err ?
    console.log(err): null);
    
    // close instance of browser
    await browser.close();
}


launchSearch();
McBraunie
  • 107
  • 1
  • 12
  • How would you estimate progress for this task? Seems like it'd be hard to do accurately without the file size and transfer rate you'd get from an HTTP/FTP downloader or file copy that's normally associated with progress bars. You could measure the average time of a few runs, then base it on that, but if the connection speed or other factors change, it'd be off. Since the task is so short, it seems pointless to add a bar, no? If you were running this code 100 times, then adding a progress bar would make more sense and be easily measureable based on the number of tasks completed. – ggorlen Nov 30 '21 at 17:35
  • 1
    BTW, I'd use the async FS API rather than the synchronous one. – ggorlen Nov 30 '21 at 17:37
  • I guess I was hoping to write a time to completion (I could have a simple generic progress bar) but was running into issues figuring out how to connect the completion of the file save as an event to trigger an end-of-process. Does that make sense? Or is it just not needed altogether? – McBraunie Nov 30 '21 at 17:40
  • The file save seems like a trivially small amount of the overall work to be done. `puppeteer.launch` and navigation are probably 99% of the time, so I'm asking how you propose to measure this in such a way that you can determine what the percentage to show would be at any given point in time. If you have figured this out and you just need help closing the progress bar, please show that code. What progress bar library are you using, if any? – ggorlen Nov 30 '21 at 17:57
  • No... I was literally thinking that I would use a dummy progress bar in the .html and .css I've written and time it to some arbitrary number and on the file-write completion, then post a "done" message. But I am having trouble with the trigger event of a file being written or saved. – McBraunie Nov 30 '21 at 18:09
  • OK, that works, but where is the HTML and CSS being displayed? Is the code here part of an express route handler? – ggorlen Nov 30 '21 at 18:11
  • I'm not totally sure what an express route handler is... but the HTML and CSS are all packaged in the same folder as this .JS file code, I would be running this js file first in tandem with a main.html and then do some dummy progress bar in that html file. I was hoping to have a good trigger to signal some sort of "all done" to move forward to the next stage. – McBraunie Nov 30 '21 at 18:32
  • 1
    This appears to be a conceptual misunderstanding. Puppeteer runs on the back end, server side, in a NodeJS process. HTML and CSS run in a totally different environment, the browser, known as the front end. To communicate between the two processes, you use HTTP or websockets, either on one machine (localhost) or over a network. The typical approach is running an Express server that the web page sends HTTP requests to. For a progress bar, websockets, HTTP requests or server-sent events would be options for implementing it so the server can send the data to the client. – ggorlen Nov 30 '21 at 18:35
  • Oh, I see. So in my case, given what I have and what I am trying to do I would need to then create a bridge between the front and back-end processes using the express server in order for the front-end to "listen" to a back-end process event? (If I understood that correctly -- again, I am pretty new to JS most of my previous experience has been in Python) – McBraunie Nov 30 '21 at 20:50
  • 1
    Yep -- that's pretty much it. [This answer](https://stackoverflow.com/a/67184841/6243352) shows a client-server [server sent events](https://en.wikipedia.org/wiki/Server-sent_events) setup that might work for you. There are other ways to do it, but SSE is pretty simple and doesn't require dependencies other than Express. – ggorlen Nov 30 '21 at 21:14

0 Answers0