Web scraping after logging in

Question

I want to take some data from (https://gps24.juwentus.pl) but to do this its necessary to log in. I dont know how to get autorization and then take data. Of cours I have login and password. Login page is (https://gps24.juwentus.pl/login).

After exploring I found that login name is "login" and password name is "pass", from below:

<input class="loginInput" type="text" name="login" value="" placeholder="Login" id="log">
<input class="loginInput" type="password" name="pass" value="" placeholder="Hasło" id="pwd">

I think the login page is: "https://gps24.juwentus.pl/openid/examples/consumer/try_auth.php" from:

<form method="get" action="/openid/examples/consumer/try_auth.php">
              <input type="hidden" name="action" value="verify">
              <input type="hidden" name="openid_identifier" value="https://juweid.juwentus.pl:9443/openid/">
              <input type="submit" id="submitloginOpenid" value="Zaloguj przez OpenID" style="padding-left: 30px; white-space: normal; padding-right: 30px;" class="login">
</form>

(but I also tried https://juweid.juwentus.pl:9443/openid/ as action in different ways)

i tried requests, session, but still getting 'not logged in page' data (supported by How to "log in" to a website using Python's Requests module?

import requests

payload = {'login': 'good_login',
           'pass': 'good_password'}

with requests.session() as c:
    c.post('https://gps24.juwentus.pl/openid/examples/consumer/try_auth.php', data=payload)
    response = c.get('https://gps24.juwentus.pl')
    print(response.text)

I tried somehow use 'after-logging-in-cookies' but also nothing happend (dont want to put them here becouse I dont know if this is safe)

I also tried something with http.cookiejar, urllib.request, urllib.parse supporting from other posts but also couldnt manage what to put where. Trying to get help from other posts but many of them seems to be outdated. Any advices where I am making mistake? Or maybe this page has to strong security?

EDIT: I make selenium headless mode but it is very slow anyway? Anyone know how to make it faster?

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options 
chrome_options = Options() 
chrome_options.add_argument("--headless") 
chrome_options.binary_location = r"C:\my_path\chrome.exe" 
driver = webdriver.Chrome(executable_path=os.path.abspath("chromedriver"),options=chrome_options) 
driver.get("https://gps24.juwentus.pl/")
driver.find_element_by_class_name('loginInput').send_keys('***') 
driver.find_element_by_name('pass').send_keys('***').send_keys(Keys.ENTER)
print(driver.find_element_by_name('something'))

Maybe somebody know how to scrape a page with is already opened and logged in? this way for sure the data will be take much much faster

with selenium I can open browser, put login ans password and click enter but i want to take data from that page **without opening browser** , becouse If i want to take data many many time during not long period I dont want that browser will opened each time it is possible to make this things with **selenium where browser wont be opened?** — jigsaw, May 26 '20 at 11:24
Yes run in the browser in headless mode. The browser won't open. — xaander1, May 26 '20 at 11:29
I tried headless mode on Chrome but it as slow as without this mode, only difference is that browser is not opening but just to get html text from page after login in it in last about 20seconds. So its not complicated action. I read that Google Chrome Canary may help but there is no ChromeDriver for version 85.0 of Canary and I cant find older versions. Is there any chace to make this faster ? — jigsaw, May 27 '20 at 11:04
Or did I something wrong? `from selenium import webdriver from selenium.webdriver.chrome.options import Options chrome_options = Options() chrome_options.add_argument("--headless") chrome_options.binary_location = r"C:\my_path\chrome.exe" driver = webdriver.Chrome(executable_path=os.path.abspath("chromedriver"),options=chrome_options) driver.get("https://gps24.juwentus.pl/") driver.find_element_by_class_name('loginInput').send_keys('***') driver.find_element_by_name('pass').send_keys('***').send_keys(Keys.ENTER) print(driver.find_element_by_name('something')` — jigsaw, May 27 '20 at 11:10
I am forced to add time.sleep() from time to time, to wait for page to load (even if page/browser is not opening) not to get an error — jigsaw, May 27 '20 at 11:52
`time.sleep()` is not recommended only used as last resort use either `selenium explicit wait` or `selenium implicit wait` https://selenium-python.readthedocs.io/waits.htm check this answer on stackoverflow answer might be useful on using selenium https://stackoverflow.com/questions/61563931/how-to-scroll-to-the-end-of-a-page-slowly-using-selenium-so-that-i-can-get-dynam — xaander1, May 27 '20 at 12:19
I applied it but time is the same +- 1sec, so still dont know if there is any option to make it faster — jigsaw, May 27 '20 at 12:43
I guess increase processor speed and internet bandwidth. Otherwise it's a bit slower...but hey you can scrape all types of websites — xaander1, May 27 '20 at 13:07
I need to find a way to scarp this data with already opened and logged in page, that python will us this browser with is already opened and logged in scrapping data will be much much faster — jigsaw, May 29 '20 at 11:32

Web scraping after logging in

0 Answers0