1

I´m new to Python and I don´t even know if I´m asking correctly, but the thing is that I need to send a request to a site to log in and this site is returning me a message telling me that I must enable javascript before use that website.

I did it with Selenium and everything is ok, working fine (more stuff, not only the log in) but now I want to make it without Selenium, actually without any browser window, is this even possible? I guess it´s, but well, I need some help with this as I´m not finding the way to do it.

#!/usr/bin/python3
import requests

userEmail = "xxxxxxxxxxx@xxxxxxxxx.com" #using real data in the script
userPass = "xxxxxxxxxxxxx" #using real data in the script

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0',
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'X-Requested-With': 'XMLHttpRequest'
}

def main():
    r = requests.post('https://www.thedomain.com/en/customer/account/loginPost/', 
    data = {'login[username]':userEmail, "login[password]":userPass}, headers=headers)
    print(r.text)

if __name__ == "__main__": main()

And this is the message I get:

<html>
    <title>You are being redirected...</title>
    <noscript>Javascript is required. Please enable javascript before you are allowed to see this page.</noscript>
</html>

Can I bypass this without Selenium?

Pizzaboy
  • 331
  • 2
  • 14
  • 1
    I would say no. The page might be dynamically generated (with Angular or React, for instance) so if JS is disabled, there is no chance you can see the generated content. – Jeremy Thille Jul 06 '17 at 11:32
  • 2
    however if you wish to do this without a browser window as your main requirement, look into phantomjs, you still use selenium but its a `"headless" browser so no window will appear, but it can still run the required javascript – James Kent Jul 06 '17 at 11:35
  • Does it matter what response you get? How does the login logic work? Usually logging in somewhere ends up in you getting a cookie or something similar (with your session ID) which you use in all subsequent requests to indicate that you are the user that just logged in. – Metareven Jul 06 '17 at 12:45
  • That was my problem @Metareven, thanks :) – Pizzaboy Jul 10 '17 at 08:11

1 Answers1

1

Use a headless browser with Selenium.

Headless browsers run in the command line. You need to run the JavaScript in the site, even more if you are talking about a SPA with no server rendering, which means that you'll see the site only after the JavaScript has run.

To use a headless browser you must install NodeJS in your system.

The most known headless browser is PhantomJS, but there are others:

sudo npm install -g phantomjs

After installing, set the driver for Selenium:

driver = webdriver.PhantomJS()

And that's it, when running you shouldn't see anything, you can even run it in a server.

Cheers!

EDIT

Another solution is to use pyvirtualdisplay, which as its name says, it creates a virtual display, which achieves the same, but doing this would allow for example, to run a chrome browser in a server. Fast example taken from here:

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=0, size=(1366, 768))
display.start()
browser = webdriver.Firefox()
browser.get('http://www.vionblog.com/')
browser.save_screenshot('vionblog.png')
browser.quit()
display.stop()
Willemoes
  • 5,752
  • 3
  • 30
  • 27
  • 1
    I wanted to avoid this because if the server is pretty busy and for example, I have a problem loading a CSS file, I guess this is not gonna work... anyway, I´m gonna try this way, is gonna be what I already have but without a proper window :) Thanks for your answer. – Pizzaboy Jul 06 '17 at 13:18
  • You're welcome. Remember that premature optimization is the root of all evil. I've used a Selenium + PhantomJS scrapper in a server loaded with other celery services, in a Medium EC2 and it worked like a charm. So be really really sure that your server is gonna be pretty busy, and if it is, there are other solutions, like using a server exclusively for the scrapper. – Willemoes Jul 06 '17 at 13:47
  • There are some **experimental** alternatives, like `pyV8` which is a wrapper around V8 (JS engine), after you get the JS from the server, you could run it with it, but you'd have other problems, like how to run multiple JS files, etc. As I've said, eventually, you'll end up running the JS code somewhere. You can also try to convert JS > Python, but it kinda doesn't make sense and you have no guarantee that it'll work like the JS. But I think for the web specifically, the best is a **headless browser**. – Willemoes Jul 06 '17 at 13:59
  • Thanks for your help Willemoes – Pizzaboy Jul 10 '17 at 08:10