3

I've been trying to scrape some raw XML data from an internal company site (url excluded for security purposes). I am currently using selenium and beautifulsoup to do so (but am open to any other options). When accessing the site manually, I am prompted with a javascript browser alert for a username and password (see picture). My attempt to automatically validate credentials is below (does not pass authentication):

def main():
    #gets specified list of direct reports
    # username:password@
    url ="http://{username}:{password}@myURL.com"
    driver.get(url)
    html = driver.page_source
    soup = BeautifulSoup(html, "lxml")
    # parsing logic follows ... 

However, when the script runs I still have to manually enter the username and password in the browsing window controlled by chromedriver and then the rest of the program runs as expected..

Is there a way avoid this manually entry? I've also tried solutions around driver.alert and sending keys & credentials to the browser to no avail.. (I know this may be difficult because the site is not accessible outside of the network, any insight is appreciated!)

Edit: I should mention this method was working a couple weeks ago, but following a chrome update no longer does..

Authentication pop-up

Funsaized
  • 1,972
  • 4
  • 21
  • 41
  • 1
    What if you tackle the problem from a different angle? Why scrape an internal site? Wouldn't it be possible to get the data elsewhere from the source of the information? – MattR Feb 05 '18 at 14:54
  • I wish :P unfortunately we are in a time crunch and don't have access to the APIs ... some things here are made more difficult than they should be.. – Funsaized Feb 05 '18 at 15:04
  • Either use an external tool like autoit (if windows) or use a proxy to inject the credentials or use an extension to set the credentials or launch a chrome profile with the credentials already set or execute an [XMLHTTPRequest](https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/open) in the page to set the credentials. – Florent B. Feb 05 '18 at 15:16
  • Possible duplicate of [Python Windows Authentication username and password is not working](https://stackoverflow.com/questions/45328654/python-windows-authentication-username-and-password-is-not-working) – undetected Selenium Feb 05 '18 at 18:00

1 Answers1

3

Your login process is likely returning an access token of some kind, either a value in the response body or a header with a token, possibly an Authorization header or a Set-Cookie header.

In most cases, you will need to send that token with every request, either as an authorization header, a body parameter, or whatever the page expects.

Your job is to find that token by inspecting the response from the server when you authenticate, store it somewhere, and send it back each time you make a page request to the server.

How you send it back is dictated by the requirements of the server in question. It may want a request body param or a header, those are the two most likely cases.

Matt Morgan
  • 4,900
  • 4
  • 21
  • 30