2

So I'm making a script that gets info from a page that you can see and retrieve data from it using BeautifulSoup, but to get the data from the main page you need to log in. How do I log in and retrieve data from the page it sends me to for parsing in BeautifulSoup?

David Greydanus
  • 2,551
  • 1
  • 23
  • 42
Crazy Clyde
  • 57
  • 1
  • 2
  • 7

2 Answers2

0

You can use requests_ntlm.

import requests
from requests_ntlm import HttpNtlmAuth

r = requests.get("http://protected_site.com",auth=HttpNtlmAuth('domain\\username','password'))

soup = r.text
print soup
#print soup.prettify()) or whatever bs4 stuff you want to do
  1. Replace protected_site.com with the domain of the site you want to get info from
  2. Replace the "domain" and "username" with the appropriate values while keeping the \\ in between them.
  3. Change print soup to whatever wonderful bs4 task you have in mind.
David Greydanus
  • 2,551
  • 1
  • 23
  • 42
  • Traceback (most recent call last): File "LoginTest.py", line 7, in print soup File "C:\Python27\lib\encodings\cp437.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode characters in position 18105-18 106: character maps to – Crazy Clyde May 09 '15 at 04:22
  • @CrazyClyde is that from printing r.text or using beautiful soup? – David Greydanus May 09 '15 at 04:38
0

before login to website it require cookies and server require user-agent to login to site so i think this will help python programm to log into the web page

Community
  • 1
  • 1
P_O_I_S_O_N
  • 357
  • 5
  • 14