3

I need a Python Warrior to help me (I'm a noob)! I'm trying to scrape certain data from an intra-net site using Module urllib. However, since it is my company website that is only available to employees to view and not to the public, I think this is why I get this code:

IOError: ('http error', 401, 'Unauthorized', )

How do I come about this? It won't even read the site using htmlfile.read()

Sample code to get public site:

import urllib
import re

htmlfile = urllib.urlopen("http://finance.yahoo.com/q?s=AAPL")

htmltext = htmlfile.read()

regex = '<span id="yfs_l84_aapl">(.+?)</span>' 

pattern = re.compile(regex)

price = re.findall(pattern,htmltext)

print price
Artjom B.
  • 61,146
  • 24
  • 125
  • 222
Adan De Leon
  • 73
  • 1
  • 2
  • 7
  • Please don't parse html with regex – heinst Jul 17 '14 at 13:56
  • @heinst Yes.Beautiful soup is a much easier way to parse HTML. https://pypi.python.org/pypi/beautifulsoup4/ – David Greydanus Jul 17 '14 at 13:59
  • Well I did come across Beautiful soup but I was avoiding the install since my company restricts a lot of stuff that I can't download :( but I am sure that I can convince some people. Thanks for the feedback! – Adan De Leon Jul 17 '14 at 14:28

1 Answers1

5

Try requests with requests_ntlm:

import requests
from requests_ntlm import HttpNtlmAuth

r = requests.get("http://ntlm_protected_site.com",auth=HttpNtlmAuth('domain\\username','password'))

    print r.text

If you need help with any specifics of this library and can't find it in the docs, leave a comment.

David Greydanus
  • 2,551
  • 1
  • 23
  • 42
  • If you install pip, you can just run, "pip install requests_ntlm" without the quotes and the will install requests_ntlm for you. https://pip.pypa.io/en/latest/installing.html – David Greydanus Jul 17 '14 at 15:13
  • 2
    YOU ARE A GENIUS!!! IT FINALLY WORKED! Thank you so much for your help! I really do appreciate your knowledge on this! – Adan De Leon Jul 17 '14 at 18:32
  • Any idea how to avoid to put my password in clear in source code beucase it will be shared across collegues? – sparkle Oct 02 '17 at 21:03
  • 1
    Thanks+1. I used to reply on r = requests.get(i, auth=(username,password) but found it works bad this time and your code fixed the issue. – uniquegino Apr 01 '20 at 20:02