9

Concept:

Using AWS Lambda functions with Python and Selenium, I want to create a undetectable headless chrome scraper by passing a headless chrome test. I check the undetectability of my headless scraper by opening up the test and taking a screenshot. I ran this test on a Local IDE and on a Lambda server.


Implementation:

I will be using a python library called selenium-stealth and will follow their basic configuration:

stealth(driver,
        languages=["en-US", "en"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
        )

I implemented this configuration on a Local IDE as well as an AWS Lambda Server to compare the results.


Local IDE:

Found below are the test results running on a local IDE: enter image description here


Lambda Server:

When I run this on a Lambda server, both the WebGL Vendor and Renderer are blank. as shown below:

enter image description here

I even tried to manually change the WebGL Vendor/Renderer using the following JavaScript command:

driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {"source": "WebGLRenderingContext.prototype.getParameter = function(parameter) {if (parameter === 37445) {return 'VENDOR_INPUT';}if (parameter === 37446) {return 'RENDERER_INPUT';}return getParameter(parameter);};"})

Then I thought maybe that it could be something wrong with the parameter number. I configured the command execution without the if statement, but the same thing happened: It worked on my Local IDE but had no effect on an AWS Lambda Server.

Simply Put:

Is it possible to add Vendor/Renderer on AWS Lambda? In my efforts, it seems that there is no possible way. I made sure to submit this issue on the selenium-stealth GitHub Repository.

Luke Hamilton
  • 637
  • 5
  • 19
  • 1
    What they are doing is client-side javascript. That's the way you should do it too... you're already doing that when you call, Object.defineProperty. The browser does not understand python. – pcalkins Dec 07 '21 at 20:13
  • @pcalkins Got it, how would the expression look like in terms of driver.execute_cdp_cmd(CLIENT-SIDE_JS) for editing WebGL Vendor and Renderer? – Luke Hamilton Dec 07 '21 at 20:16
  • Not sure... I would try building the script and script calls as a string and pass it in... like javascript_to_execute = "function yourfunction() {....} yourfunction();" and then driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {"source": javascript_to_execute}) Maybe? (I haven't used the cdp stuff before... but this is interesting if it's working. You'd avoid their proxy workaround...) Not sure that this would work on ajax calls though... let us know how it goes. – pcalkins Dec 07 '21 at 20:25
  • @pcalkins this is very helpful and makes a lot of sense! I made small edits to my post to reflect your findings. Essentially I am needing a one-lined JavaScript command that would change the WebGL Vendor/Renderer. – Luke Hamilton Dec 07 '21 at 20:30
  • What they do in that article is override/redefine the protoype for getParameter method of WebGLRenderingContext. So you need all of that part... https://developer.mozilla.org/en-US/docs/Web/API/WebGLRenderingContext/getParameter So normally the call would be "gl.getParameter(gl.VERSION);" You have to assign a new getParameter function before that call is made. – pcalkins Dec 07 '21 at 21:17
  • @pcalkins I have made a progress update to my post. It seems like this is a Lambda-specific problem. I was able to figure out how to change WebGL Vendor/Renderer on my Local IDE, but still an issue on the Lambda side. – Luke Hamilton Dec 08 '21 at 21:12

2 Answers2

3

WebGL

WebGL is a cross-platform, open web standard for a low-level 3D graphics API based on OpenGL ES, exposed to ECMAScript via the HTML5 Canvas element. WebGL at it's core is a Shader-based API using GLSL, with constructs that are semantically similar to those of the underlying OpenGL ES API. It follows the OpenGL ES specification, with some exceptions for the out of memory-managed languages such as JavaScript. WebGL 1.0 exposes the OpenGL ES 2.0 feature set; WebGL 2.0 exposes the OpenGL ES 3.0 API.

Now, with the availability of Selenium Stealth building of Undetectable Scraper using Selenium driven ChromeDriver initiated Browsing Context have become much more easier.


selenium-stealth

selenium-stealth is a python package selenium-stealth to prevent detection. This programme tries to make python selenium more stealthy. However, as of now selenium-stealth only support Selenium Chrome.

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.chrome.service import Service
    from selenium_stealth import stealth
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    s = Service('C:\\BrowserDrivers\\chromedriver.exe')
    driver = webdriver.Chrome(service=s, options=options)
    
    # Selenium Stealth settings
    stealth(driver,
          languages=["en-US", "en"],
          vendor="Google Inc.",
          platform="Win32",
          webgl_vendor="Intel Inc.",
          renderer="Intel Iris OpenGL Engine",
          fix_hairline=True,
      )
    
    driver.get("https://bot.sannysoft.com/")
    
  • Browser Screenshot:

bot_sannysoft

You can find a detailed relevant discussion in Can a website detect when you are using Selenium with chromedriver?


Changing WebGL Vendor/Renderer in AWS Lambda

AWS Lambda enables us to deliver compressed WebGL websites to end users. When requested webpage objects are compressed, the transfer size is reduced, leading to faster downloads, lower cloud storage fees, and lower data transfer fees. Improved load times also directly influence the viewer experience and retention, which helps in improving website conversion and discoverability. Using WebGL, websites are more immersive while still being accessible via a browser URL. Through this technique AWS Lambda to automatically compress the objects uploaded to S3.

product-page-diagram_Lambda-RealTimeFileProcessing.a59577de4b6471674a540b878b0b684e0249a18c

Background on compression and WebGL

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization. This capability is negotiated between the server and the client using an HTTP header which may indicate that a resource being transferred, cached, or otherwise referenced is compressed. AWS Lambda on the server-side supports Content-Encoding header.

On the client-side, most browsers today support brotli and gzip compression through HTTP headers (Accept-Encoding: deflate, br, gzip) and can handle server response headers. This means browsers will automatically download and decompress content from a web server at the client-side, before rendering webpages to the viewer.


Conclusion

Due to this constraint you may not be able to change the WebGL Vendor/Renderer in AWS Lambda, else it may directly affect the process of rendering webpages to the viewers and can stand out to be a bottleneck in UX.


tl; dr

You can find a couple of relevant detailed discussion in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • I appreciate the response. As I explained in my progress update, I used selenium stealth and it worked to change my WebGL Vendor/Renderer on my local IDE but could not get it to change on AWS Lambda. To glean from your conclusion, you think there is no way to change WebGL Vendor/Renderer in AWS Lambda? I have implemented all other measures aside from that. – Luke Hamilton Dec 14 '21 at 23:35
0

A solution I found for the missing WebGL Vendor/Renderer was using a docker container instead of the normal Lambda layers when creating a function. Not only does the storage increase by a factor of 40X but it also solves the WebGL Vendor/Renderer problem: enter image description here

Luke Hamilton
  • 637
  • 5
  • 19