4

I'm trying to log in to a website from this URL: "https://pollev.com/login". Since I'm using a school email, the portal redirects to the school's login portal and uses that portal to authenticate the login. It shows up when you type in a uw.edu email (example: myname@uw.edu). After logging in, UW sends a POST request callback to https://www.polleverywhere.com/auth/washington/callback with a SAMLResponse header like this. I think I need to simulate the GET request from pollev's login page and then send the login headers to the UW login page, but what I'm doing right now isn't working.

Here's my code:

import requests

with requests.session() as s:
     header_data = {
    'user - agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
                    '(KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
    'referer': 'https://pollev.com/login'
    }
    login_data = {
    'j_username' : 'username',
    'j_password' : 'password',
    '_eventId_proceed' : 'Sign in'
    }

    r = s.get('https://idp.u.washington.edu/idp/profile/SAML2/Redirect/SSO?execution=e2s1',
          headers=header_data, data=login_data)
    print(r.text)

Right now, r.text shows a NoSuchFlowExecutionException html page. What am I missing? Logging into the website normally requires a login, password, Referrer, and X-CSRF token which I was able to do, but I don't know how to navigate a redirect for authentication.

Daniel Q
  • 137
  • 1
  • 2
  • 11
  • Without knowing much about the architecture of the systems you're trying to access, my best guess would be that you're not simulating a proper SAML request (signed XML exchange). Identity Providers which work with SAML SSO usually require a more complicated authentication flow than a simple GET request. – bitnahian Oct 03 '18 at 01:41

2 Answers2

5

Old question but I had nearly identical needs and carried on until I solved it. In my case, which may still be the case of the OP, I have the required credentials. I am certain this could be made more efficient / pythonic and would greatly appreciate those tips / corrections.

import re
import requests

# start HTTP request session
s = requests.Session()

# Prepare for first request - This is the ultimate target URL
url1 = '/URL/needing/shibbolethSAML/authentication'
header_data = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'}

# Make first request
r1 = s.get(url1, headers = header_data)

# Prepare for second request - extract URL action for next POST from response, append header, and add login credentials
ss1 = re.search('action="', r1.text)
ss2 = re.search('" autocomplete', r1.text)
url2 = 'https://idp.u.washington.edu' + r1.text[ss1.span(0)[1]:ss2.span(0)[0]]
header_data.update({'Accept-Encoding': 'gzip, deflate, br', 'Content-Type': 'application/x-www-form-urlencoded'})
cred = {'j_username': 'username', 'j_password':'password', '_eventId_proceed' : 'Sign in'}

# Make second request
r2 = s.post(url2, data = cred)

# Prepare for third request - format and extract URL, RelayState, and SAMLResponse
ss3 = re.search('<form action="',r2.text) # expect only one instance of this pattern in string
ss4 = re.search('" method="post">',r2.text) # expect only one instance of this pattern in string
url3 = r2.text[ss3.span(0)[1]:ss4.span(0)[0]].replace('&#x3a;',':').replace('&#x2f;','/')

ss4 = re.search('name="RelayState" value="', r2.text) # expect only one instance of this pattern in string
ss5 = re.search('"/>', r2.text)
relaystate_value = r2.text[ss4.span(0)[1]:ss5.span(0)[0]].replace('&#x3a;',':')

ss6 = re.search('name="SAMLResponse" value="', r2.text)
ss7 = [m.span for m in re.finditer('"/>',r2.text)] # expect multiple matches with the second match being desired
saml_value = r2.text[ss6.span(0)[1]:ss7[1](0)[0]]

data = {'RelayState': relaystate_value, 'SAMLResponse': [saml_value, 'Continue']}
header_data.update({'Host': 'training.ehs.washington.edu', 'Referer': 'https://idp.u.washington.edu/', 'Connection': 'keep-alive'})

# Make third request
r3 = s.post(url3, headers=header_data, data = data)

# You should now be at the intended URL
ajschauer
  • 106
  • 2
  • 5
3

You're not going to be successful faking out SAML2 SSO. The identity provider (IdP) at UW is looking to support an authentication request from the service provider (SP) polleverywhere.com. Part of that is verifying the request actually originated from polleverywhere. This could be as simple has requiring SSL connection from polleverywhere, it could be as complicated as requiring an encrypted & signed authentication request. Since you don't have those credentials, the resulting response isn't going to be readable. SPs are registered with IdPs.

Now, there may be a different way to sign into polleverywhere -- a different URL which will not trigger an SSO request, but that might be network restricted or require other difficult authentication.

pbuck
  • 4,291
  • 2
  • 24
  • 36