So I'm making a script that gets info from a page that you can see and retrieve data from it using BeautifulSoup, but to get the data from the main page you need to log in. How do I log in and retrieve data from the page it sends me to for parsing in BeautifulSoup?
Asked
Active
Viewed 117 times
2
-
1Where a you trying to get this info from? – David Greydanus May 09 '15 at 03:42
-
You're probably looking for something like this: http://stackoverflow.com/a/24805764/2487476 – David Greydanus May 09 '15 at 03:47
2 Answers
0
You can use requests_ntlm
.
import requests
from requests_ntlm import HttpNtlmAuth
r = requests.get("http://protected_site.com",auth=HttpNtlmAuth('domain\\username','password'))
soup = r.text
print soup
#print soup.prettify()) or whatever bs4 stuff you want to do
- Replace protected_site.com with the domain of the site you want to get info from
- Replace the "domain" and "username" with the appropriate values while keeping the
\\
in between them. - Change
print soup
to whatever wonderfulbs4
task you have in mind.

David Greydanus
- 2,551
- 1
- 23
- 42
-
Traceback (most recent call last): File "LoginTest.py", line 7, in
print soup File "C:\Python27\lib\encodings\cp437.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode characters in position 18105-18 106: character maps to – Crazy Clyde May 09 '15 at 04:22 -
@CrazyClyde is that from printing r.text or using beautiful soup? – David Greydanus May 09 '15 at 04:38
0
before login to website it require cookies and server require user-agent to login to site so i think this will help python programm to log into the web page

Community
- 1
- 1

P_O_I_S_O_N
- 357
- 5
- 14