0

I am trying to connect to this website https://operations.daxko.com/Login via Python in order to scrape a bunch of financial transactions for a non-profit I'm doing some work for. I cannot for the life of me figure out how to get passed the login page though. I have checked the following threads:

How can I login to a website with Python?

Python - Login and download specific file from website

Python: Login to ajax website using request

Here's my code:

# from urllib.request import urlopen
# from urllib.error import HTTPError
from bs4 import BeautifulSoup
# import pandas as pd
from pyquery import PyQuery
import requests
# from twill.commands import *

url = "https://operations.daxko.com/Login"
user = 'my_username'
password = 'my_password'
payload = {'username': f'{user}', 'password': f'{password}'}

result = requests.get(url, auth=(user, password))
s = requests.Session()
s.get(url)
s.post(url, data = payload)
explore_url = 'https://operations.daxko.com/the-financials-i-want'
page1 = s.get(explore_url)
c = page1.content
soup = BeautifulSoup(c,'lxml')

But 'soup' is still the login page

OS: Windows 10

Python 3.6

Pype
  • 75
  • 2
  • 8

2 Answers2

3

You are resetting the session with s = requests.Session() after the first get.

Try this (untested):

# from urllib.request import urlopen
# from urllib.error import HTTPError
from bs4 import BeautifulSoup
# import pandas as pd
from pyquery import PyQuery
import requests
# from twill.commands import *

url = "https://operations.daxko.com/Login"
user = 'my_username'
password = 'my_password'
payload = {'username': f'{user}', 'password': f'{password}'}

with requests.Session() as s:
   soup = BeautifulSoup(s.get(url).content,'lxml')
   payload['__RequestVerificationToken'] = soup.find("input", {"name": "__RequestVerificationToken"})['value']
   s.post(url, data = payload)
   explore_url = 'https://operations.daxko.com/the-financials-i-want'
   page1 = s.get(explore_url)
   c = page1.content
   soup = BeautifulSoup(c,'lxml')

EDIT:

After inspecting that website, I see that your form data is incomplete. You need to pass a verification token in your payload. See edited answer.

drec4s
  • 7,946
  • 8
  • 33
  • 54
0

The form this page submits does contain three input elements:

  • username
  • password
  • __RequestVerificationToken

You will have to include the token for your request to be accepted.

  • I do not know much about internet security. I'm assuming the token changes so is there a way to ensure I always have one? – Pype Mar 26 '18 at 19:36
  • You will have to scrape it from the form. –  Mar 26 '18 at 19:53