1

I've been trying to scrape data from Fantasy Premier League (https://fantasy.premierleague.com) and when I try to login through requests module in Python, I get the 405 error.

To get the data I need, first I need to log-in to the site. So, I manually entered my username and password in a json format after getting the ids from the webpage. I also included the hidden fields the form required. I created a Session variable and sent a post request to the the site with this data variable for the data parameter,

import requests

session = requests.Session()
data = {
            "loginUsername" : "username", 
            "loginPassword" : "password", 
            "app" : "plfpl-web", 
            "redirect_uri" : "https://fantasy.premierleague.com/"
       }

url = "https://fantasy.premierleague.com/"

login = session.post(url, data = data)

print(login.text)

And I get the following output

<html>
<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx/1.13.5</center>
</body>
</html>

I tried the same method for different sites, such as Twitter and got either 405 or 403 error message like above.

What can I change to get the request successfully? I know I can use Selenium, but I'm planning on making a small project and distributing to others and I want the data scraping to happen without the browser drivers.

  • it is good to set user-agent header from real browser. Normally request sends text "Python..." so it is easy to recognize bot/script and block it. It is also good to execute `session.get(...)` at start (like human would do) to get new cookies from server. – furas Aug 07 '19 at 22:36
  • different pages may use different methods to block scripts/bots. Sometimes they use so many JavaScript that it is much easier to use Selenium. – furas Aug 07 '19 at 22:37
  • and more important - send correct FIELDS to correct URL. In your code both are incorrect. Use `DevTools` in Chrome/Firefox to see all requests from browser to server. – furas Aug 07 '19 at 22:45

1 Answers1

0

Your problem is that you send wrong FIELDS to wrong URL.

Using DevTools in Chrome/Firefox you can see that browser sends fields login, password (instead of loginUsername, loginPassword) to https://users.premierleague.com/accounts/login/

import requests

session = requests.Session()

#session.headers.update({'user-agent': 'Mozilla/5.0'})

data = {
     "login" : "james.bond@mi6.com", 
     "password" : "007", 
     "app" : "plfpl-web", 
     "redirect_uri" : "https://fantasy.premierleague.com"
}

#url = "https://fantasy.premierleague.com"
#r = session.get(url)
#print(r.status_code)

url = "https://users.premierleague.com/accounts/login/"
r = session.post(url, data=data)
print(r.status_code) # 200
#print(r.text)

Many times it is good to use User-Agent header from real browser - or at least 'Mozilla/5.0' and get main page to get fresh cookies. For this pages it wasn't needed but I keep code in comments.


EDIT: (2020.07.10)

Code to login.

BTW: After correct login server redirects to different URL so I use this fact to check if I'm logged in.

import requests
from bs4 import BeautifulSoup

session = requests.Session()
#session.headers.update({'user-agent': 'Mozilla/5.0'})

login_url = "https://users.premierleague.com/accounts/login/"

# GET page with form
r = session.get(login_url, data=data)
soup = BeautifulSoup(r.content)

data = {
     "login" : "your_login", 
     "password" : "your_password", 
}

# get values from form (except empty places for login and password)
for item in soup.find_all('input'):
    key = item['name']
    value = item.get('value') # I use get('value') instead of ['value'] to get None instead of error when there is no value like for login and password.
    if value:
        data[key] = value
    print(key, '=', value)
    
# POST form data to login
r = session.post(login_url, data=data)

# check if url is different
print(r.url)
print(r.url != login_url)
furas
  • 134,197
  • 12
  • 106
  • 148
  • Hey thanks. This worked. And yeah I noticed that I had absent-mindedly used the id of the field instead of the name. May I know where you got the actual URL from. Because usually to make use of the site, I use the URL I specified and thought it'd be the same. – gautham_ram_p Aug 07 '19 at 22:58
  • Never mind, just found it in the action in the form tag. Thanks anyways for pointing out where I went wrong. Since you mentioned Selenium, is there any way to scrape a data of a user using the credentials without them knowing that the data is scraped, like without the web driver opening in their windows. – gautham_ram_p Aug 07 '19 at 23:07
  • I used DevTools which is built-in in Chrome/Firefox. In tab "Network" it shows all requests send from client to server. And responses – furas Aug 07 '19 at 23:08
  • for `url = ('https://fantasy.premierleague.com/api/my-team/{teamid}/')` I am can't access the json. it says, `credentials are not provided`. Does it mean I have not logged in yet? But I used the above code to get response 200. How can i solve it? – Manan Jul 10 '20 at 06:21
  • response 200 means only that server recognized URL and it know how to send answer - it doens't mean you are logged in. You should rather check if there in some element in HTML which exists only when you logged in - ie. your name or login – furas Jul 10 '20 at 10:07
  • BTW: few days ago was question [Table Web Scraping Issues with Python](https://stackoverflow.com/questions/62521126/table-web-scraping-issues-with-python/) for the same page – furas Jul 10 '20 at 10:48