1

I'm trying to web scrape from WITHIN a secure network. Security is already tight and I have a username and password- but if you open the site I'm trying to get on with my program, you wouldn't be prompted to login (because you're inside the network). I'm having trouble with the authentication here...

import requests

url = "http://theinternalsiteimtryingtoaccess.com"
r = requests.get(url, auth=('myusername', 'mypass'))
print(r.status_code)

>>>401

I've tried HTTPBasicAuth, but that didn't work either. Are there any ways with requests to get around this? Just another note, the 'urlopen' command will open this site on command without any authentication being required...Please help! Thanks!

EDIT: After finding this question- (How to scrape URL data from intranet site using python?), I tried the following:

import requests
from requests_ntlm import HttpNtlmAuth
r = requests.get("http://theinternalsiteimtryingtoaccess.aspx",auth=HttpNtlmAuth('NEED DOMAIN HERE\\usr','pass'))
print(r.status_code)

>>>401 #still >:/

RESOLVED: Make sure that if you're having this problem, and you're trying to access an internal site, that in the code you specify your particular domain. I was trying to login but the computer didn't know where to log me into. You can find the domain you're on by going to control panel>>>system and domain should be listed there. Thank you!

Community
  • 1
  • 1
Vince
  • 171
  • 1
  • 3
  • 11
  • try the oauth library – sameera sy Jun 22 '16 at 12:16
  • @sameerasy I followed to oauth1session documentation to a tee and this is what python came back with: Token request failed with code 401, response was '401 UNAUTHORIZED'. – Vince Jun 22 '16 at 13:40
  • I found a link to another question that was similar- (http://stackoverflow.com/questions/24805432/how-to-scrape-url-data-from-intranet-site-using-python). This is an intranet company website... I don't have to login when I access it. – Vince Jun 22 '16 at 14:18

1 Answers1

0

It's extremely unlikely we can give you the exact solution to your problem, but I would guess the intranet uses some sort of corporate proxy. I would think your requests need to be directed to the proxy and not as if it's hitting an external public site.

For more information on this check out the official docs.

http://docs.python-requests.org/en/master/user/advanced/#proxies

Chris Hawkes
  • 11,923
  • 6
  • 58
  • 68