0

one response returns with the data that I want, and the other response(in the different environment(firebase servers) , fail with an error, telling me: 'You look like robot'(I'm getting this error from my friends too , lol).

Overview:

I'm using puppeteer , in java script, to get results after clicking a button. the problem is, that although the code works as expected in my local machine . the same code, when deploys to firebase servers(firebase cloud functions), get a different result for the same request:

an over view of the process in the headless chromium browser:

clicking a button ----> a label change with a new data , I collect this data. the clicking of a button trigger a request that return a response of a json file, with the data that I wanted, to be stored in a label in the page.

the same request happen in the same environments(the local, and the firebase servers) I manage to see that using the puppeteer 'on' method , and print it to the console log:

  page.on('requestfinished', request => console.info(` Finished request: ${request.url()}`));

I'm also using a method to see the responses that im getting, like so:

 await page.on('response', async response =>{
      try {
       console.info((await response.json()));
      } catch (error) {
        console.error(error);
      }
 });

as I said, the result of the json that I'm getting after the same request, is different. while in the local environment everything is in order.

I'm getting this json with an error message, for the same request.

{"success":false,"message":"You look like a robot."}

any idea how/why this is happing , and how to solve it?

UPDATE #1

the code Only works when I'm not deploying it To A server , even if its my local host. I'm getting the response I want Only when I run it on VSCode(without a server functions surrounded it

meaning that only when the code is not incapsulate in a server function, it's working. the website some how 'know' when I'm using a server. why could this be, and is there a way to avoid it?

UPDATE #2

I manage to receive a response on my localhost , using the puppeteer-page-proxy package. one each request like so:

const useProxy = require('puppeteer-page-proxy');

page.on('request', async request => {
  await useProxy(request, 'https://116.80.49.253:3128');
});

BUT, when deploying it to the firebase cloud function I'm getting 'crash' in the firebase logs : enter image description here

what could be the reason for that, is is possible that 'proxy' I not enabled on the firebase cloud functions?

UPDATE #3

so, the 'quest' goes on on this episode...

Now,it's getting a little weirder as I was saying, in update #2 I was finally manage to get a proper response(of a json to the request that that click on a button triggers) ----> undetected as a 'Robot'(Tough I'm ,we all are some kind of AI .LOL).✅

this happen on my localhost ---> firebase serve --only functions

NOW, on the 'other hand' -----> firebase deploy --only functions meaning when deploy it to the firebase server I was able to overcome the 'crash' in the firebase servers ,AND ---> getting a json response from it!

BUT, Alas! , the response I got was again ---> 'You look like a Robot' ----> I'm using the same code!.

the ONLY different is 'where' the code run from.

deploying it to my localhost ---> working! ✅

firebase serve --only functions

deploying it to the firebase servers ---> NOT working!❌

P.S I'm trying the best I can to described the problem

right now the problem is that some how, the site in which im using puppeteer to control him, detect some how, that I'm using a driver when using it from firebase servers, But not when I'm using it from my local host, God Knows How :)

**Ok, here's the code ** --> everything for getting an answer to this.. Hope this will get some sense of the problem I'm facing and maybe ,just maybe help to solve it.

const functions = require("firebase-functions");
const admin = require("firebase-admin");
const puppeteer = require('puppeteer');
const useProxy = require('puppeteer-page-proxy');
admin.initializeApp()


exports.startPuppeteer = functions.runWith({memory:'1GB'}).https.onRequest((request, response) => {


console.log('hello world');

(async () => {
  console.log('im in the functions');
  const browser = await puppeteer.launch({headless:true})
  const page = await browser.newPage()



  // setting requests to use the proxy when requesting
  await page.setRequestInterception(true);
  page.on('request', async request => {
  try{
    if ( [
      //'stylesheet', 
      'image', 
      'media', 
      'font',
  ].indexOf( request.resourceType() ) !== -1 ) {
    request.abort();
  } else {
    request.continue();
    await useProxy(request, 'https://116.80.49.253:3128');
  };
  }catch(e){
    request.continue();
  }
});

// getting and printing to the console the response, if its some kind of Json : 
await page.on('response', async response =>{
    try {
     console.info((await response.json()));
     console.succ( await response.json())
    } catch (error) {
    }
});

// going to the site that as the button that i want to be clicked on in order to fetch the result of the data the will follows the clicking:
  await page.goto('https://thesitetobecontrel.com');
  console.log('done?');


// setting the screen to a proper dimensions in order to click the button:
  await page.setViewport({
    width: 1200,
    height: 800
});

await page.evaluate('window.scrollTo(0, 300)')
try {

// getting the text field object that need to be filled with the relevent data:
  const [textFiled] = await page.$x('/html/body/div[1]/div[1]/div[1]/div/div[1]/center/div/input');
  // getting the dropdown list object in order to pick the wanted option 
  const [listToSelcetOptionFrom] = await page.$x('//*[@id="content"]/div[1]/div[1]/div/div[1]/center/div/div/select');

  // getting the button that need to be clicked after filling the text field, and picking an option from the list
  const [btn] = await page.$x('//*[@id="content"]/div[1]/div[1]/div/button');
  
// selecting the desired option , with text :
  await listToSelcetOptionFrom.select('optionText');
  console.log('done?');
  // filling the  text field with the relevent info that need to be set before clicking the button : 
  await textFiled.type('someInfo');

  await page.waitForTimeout(2000);

  // clicking on the button with mouse
  const box = await btn.boundingBox();
  const x = box.x + (box.width/2);
  const y = box.y + (box.height/2);
  console.log(x, y);
  page.mouse.move(x,y);
  page.mouse.click(x,y)
  // waiting for the result of the click
  await page.waitForTimeout(7000);
  console.info('wait over 7 secound...');

  // fetcing, and sending the result
  const [result] = await page.$x('//*[@id="content"]/div[1]/div[1]/div/div[2]/div');
  //   const txt = await result.evaluate.toString
  let value = await page.evaluate(el => el.textContent, result);
  console.log(value);
  console.log('done?');
  await browser.close();
  const dic = {};
  dic['status'] = 200;
  dic['data'] = {"message": value};
  response.send(dic);
} catch(err) {

  console.error(err);
  const dic = {};
  dic['status'] = 200;
  dic['data'] = {"message": 'Error'};

}
})});
yehudshe
  • 45
  • 7

1 Answers1

0

Some websites only want human users to access them. For this reason they might block IPs from service providers such as Firebase or GCP. I think the website you are trying to scrape is what is returning the message:

{"success":false,"message":"You look like a robot."}

Even when using a proxy a website can know the original IP of the client using the X-Forwarded-For header, as explained in this answer.

In order to verify my hypothesis I encourage you to contact the owners of that page and ask them if they are blocking Google IPs.

As a side note in order to know if an IP belongs to Google they might have followed this guide.

Lluís Muñoz
  • 409
  • 3
  • 11