1

I'm building an application that takes two parameters; the request URL and a CSS query selector. I'm having a hard time getting the request to look like: "http://localhost:5000/scrapeme/us-central1/scraperSelector?requestURL=https://www.google.com&selector=#hplogo". The request is not accepting the selector variable and returning not defined.

I'm not really sure what I'm doing wrong and I've tried different methods such as to request.body or creating an object and pass that down in the code. I've read the Google documentation and couldn't really find a good example of passing multiple parameters in a cloud function.

const admin = require('firebase-admin');
const functions = require('firebase-functions');
const puppeteer = require("puppeteer");
const chalk = require("chalk");

admin.initializeApp();

// for yellow console logging
const checking = chalk.bold.yellow;

// const uri = "http://localhost:5000/scrapeme/us-central1/scraperSelector";
// const appURL = "scrapeme.firebaseapp.com";


exports.scraperSelector = functions.runWith({ memory: '1GB' }).https.onRequest(async(request, response) => {
    // initialize varialbe to request params
    const requestURL = request.query.requestURL;
    console.log("Evaluating " + requestURL);

    let selector = request.query.selector;
    console.log("Evaluating " + selector);

    console.log("Evaluating " + request.originalUrl);

    // Launch a browser
    const browser = await puppeteer.launch({
        headless: true,
        args: ['--no-sandbox', '--disable-setuid-sandbox']
    });

    // Visit the page a get content
    const page = await browser.newPage();

    // Go to requested URL
    await page.goto(requestURL, { waitUntil: 'networkidle0' });
    console.log(checking("Evaluating " + requestURL));

    // find the css selector
    const content = await page.evaluate(() => {
        console.log(JSON.stringify(selector));

        let selectorCSS = document.querySelector(selector).innerText;
        console.log(selectorCSS);

        return selectorCSS;

    },);

    // Send the response
    response.json(content);

});

// Example URL of how request should look
// http://localhost:5000/scrapeme/us-central1/scraperSelector?requestURL=https://www.google.com&selector=#hplogo

I expect the output to resolve to a JSON response. I'm trying to grab a single item from a page. { "result": "$18.41" }

However, I'm getting this output and error:

Evaluating https://www.google.com

Evaluating

Evaluating /scrapeme/us-central1/scraperSelector?requestURL=https://www.google.com&selector=

Evaluating https://www.google.com

! functions: Error: Evaluation failed: ReferenceError: selector is not defined at puppeteer_evaluation_script:2:36

2 Answers2

1

You have to pass the selector variable to the evaluate function.

    //...
    let selector = request.query.selector;
    //...
    const content = await page.evaluate(selector => { // <-- add the `selector` variable.
    console.log(JSON.stringify(selector));

    let selectorCSS = document.querySelector(selector).innerText;
    console.log(selectorCSS);

    return selectorCSS;

    }, selector); // <-- add the `selector` variable

Read more docs.

Yevhen Laichenkov
  • 7,746
  • 2
  • 27
  • 33
0

The issue is that # is a special character in URLs. It signals to the web browser a string called a "fragment" that targets an anchor on a web page.

If you want to pass a parameter to a function via the query string of the URL, it should be URL escaped with percent encoding. So, with that encoding, your URL parameter would be selector=%23hplogo.

Typically, you use a library to encode all the parameters you pass, so that they're all valid no matter what string they contain.

Doug Stevenson
  • 297,357
  • 32
  • 422
  • 441
  • It works after manually encoding! Thanks! However not sure how I can call automatically encode everything after selector= and encode that? Should I create another function for encoding and decoding? –  Sep 12 '19 at 17:18
  • You might want to do some searches to find ways to encode parameters passed in the query string. – Doug Stevenson Sep 12 '19 at 18:21
  • I figured it out was making the wrong request type and manually entering it into the URL bar. I used postman this time and set it to post with application type JSON. Then I actually got it working! Thanks. –  Sep 18 '19 at 17:43
  • I'm curious what it was about the other answer here that made it the accepted "correct" answer? If it is not correct, I don't want to leave it that way so that it confuses other people with the same issue. If you have your own solution, please add it as an answer here, and mark it as correct. – Doug Stevenson Sep 18 '19 at 20:55
  • Well, both answers are correct since I was missing the parameter in the evaluate function. It does help other users not make the same mistake. The behavior when I encoded manually into the URL bar returned the result. However, not when non-encoded. But I found that when I made a post request using postman it returned the result of the string value every time which is the desired result. –  Sep 21 '19 at 06:08