1

I'm trying to access some json data from a webservice that uses microsoft authentication

I have a username and password that I can login with in the browser.

If I pass the login data as auth the response is a wall of illegible html and js script

s = requests.Session()
login_data =  {'login':username, 'loginfmt':username, 'passwd':pw}
r=s.post(login_url,auth=login_data)
r= s.get(json_url)
print(r.text)

I have tried copying the network data, cookies and headers from when I login through the browser, but with that method i also just get a wall of illegible html and js

cookies = {
    'x-ms-gateway-slice': 'estsfd',
    'stsservicecookie': 'estsfd',
    'AADSSO': 'NA|NoExtension',
    'buid': '...',
    'fpc': '...',
    'esctx': '...',
    'brcap': '0',
    'clrc': '...',
    'wlidperf': '...',
}

headers = {
    'Connection': 'keep-alive',
    'Cache-Control': 'max-age=0',
    'sec-ch-ua': '"Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92"',
    'sec-ch-ua-mobile': '?0',
    'Upgrade-Insecure-Requests': '1',
    'Origin': 'https://login.microsoftonline.com',
    'Content-Type': 'application/x-www-form-urlencoded',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-User': '?1',
    'Sec-Fetch-Dest': 'document',
    'Referer': 'https://login.microsoftonline.com/.../oauth2/authorize?client_id=...&redirect_uri=...&response_type=id_token&scope=...&x-client-ver=6.8.0.0&sso_reload=true',
    'Accept-Language': 'en-US,en;q=0.9',
}

data = {
  'i13': '0',
  'login': '',
  'loginfmt': '',
  'type': '11',
  'LoginOptions': '3',
  'lrt': '',
  'lrtPartition': '',
  'hisRegion': '',
  'hisScaleUnit': '',
  'passwd': '',
  'ps': '2',
  'psRNGCDefaultType': '',
  'psRNGCEntropy': '',
  'psRNGCSLK': '',
  'canary': '...',
  'ctx': '...',
  'hpgrequestid': '...',
  'flowToken': '...',
  'PPSX': '',
  'NewUser': '1',
  'FoundMSAs': '',
  'fspost': '0',
  'i21': '0',
  'CookieDisclosure': '0',
  'IsFidoSupported': '1',
  'isSignupPost': '0',
  'i2': '1',
  'i17': '',
  'i18': '',
  'i19': '...'
}
s = requests.Session()
r=s.post(login_url, headers=headers, cookies=cookies, data=data)
r= s.get(json_url)
print(r.text)

this data, namely the canary, ctx, hprequestid, and flow token changes from post to post

the only thing that works is to get the cookies from after authentication

cookies = {
    'ARRAffinity': '...',
    'ARRAffinitySameSite': '...',
    '.AspNetCore.AzureADCookie': 'chunks-2',
    '.AspNetCore.AzureADCookieC1': '...',
    '.AspNetCore.AzureADCookieC2': '...',
}
s = requests.Session()
r= s.get(json_url,cookies=cookies)
print(r.text)

but the cookie expires after a while and it's not super sustainable to copy the cookie into the script manually every time.

I've tried to read up on the Kerberos and msal modules but I cannot find anything on retrieving data from a webservice that uses microsoft authentication, only how to set up ms auth for your your own webservice.

DennisBo
  • 48
  • 7

1 Answers1

0

using requests you need a bunch of data that the html provides somewhere, I sort of should have picked up on this with canary ctx and hprquestid change by each request. this question and the answers are very much in the same vein: Login to Facebook using python requests

What I ended up doing was using selenium to login and grab the json.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import datetime, json

browser = webdriver.Firefox()
browser.get(json_url)
elem = browser.find_element_by_name('loginfmt')
elem.send_keys(username + Keys.RETURN)

time.sleep(1)
elem = browser.find_element_by_name('passwd')
elem.send_keys(pw + Keys.RETURN)

time.sleep(1)
browser.get(json_url)

elem = browser.find_element_by_id('json')
json_data_rettid = json.loads(elem.get_attribute('innerHTML'))

browser.quit()

(I know I should use the build in selenium wait function but I didn't at the time of writing this code)

Note: I found it really difficult to get the chrome driver working so I recomend using the firefox driver

DennisBo
  • 48
  • 7