Accesing a URL opens 2 pages (in two tabs), I want to be able to select one of the two

Question

I'm using selenium to scrape some product pages but lately I've been getting only the login page (instead of the product page that I wanted). So I tried loading the page in my browser and turns out that accessing any product URL will open two tabs: one for login and one for the product itself. So I don't need to login, I just need to be able to scrape from one of the two pages that are opened each time I try to access the URL.

I have a dataframe with the URLs and the different fields that I need to be scraped blank, so then I pass the URL as "myurl" to this function:

item_id=myurl[20:-5]
browser.get(myurl)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
try:
    titulo = soup.find('div', {'class':'sku-name'}).get_text(strip=True)
except:
    titulo=""

and then reading each field from the soup I'm using chromedriver in python

Any help is greatly appreciated!

Can you share some more code or the URL? So that we can see and suggest. By looking at your description it's hard to get what went wrong. — Swaroop Humane, Aug 01 '20 at 02:07
Sure, this is one of the URL https://item.jd.com/4487398.html — Gustavo Moreno, Aug 01 '20 at 02:16
and this is the login URL that opens at the same time as the item: https://passport.jd.com/uc/login?ReturnUrl=http%3A%2F%2Fitem.jd.com%2F4858599.html — Gustavo Moreno, Aug 01 '20 at 02:18
https://stackoverflow.com/questions/58431693/how-to-find-the-list-of-tab-names-in-actively-opened-browser-using-python-script — Swaroop Humane, Aug 01 '20 at 05:21

Swaroop Humane · Answer 1 · 2020-08-01T05:16:15.997

Sorry if I don't understand your requirement correctly, but the code below is working well for me and it is opening every product page one by one.

from selenium import webdriver
import time

driver = webdriver.Chrome()

# you can make a list of products and feed it into the main URL. to get the
# specific product page. But if you have the range then you can use the below code.
# Save that in a variable and process it by Beautifulsoup.

for i in range(4487300, 4487401):
    driver.get(f'https://item.jd.com/{i}.html')
    time.sleep(5)
    product_page_source = driver.page_source
    print(product_page_source)

Note - There are many sites that will give you limited access to the there product page directly after some threshold they will redirect you to their login page for authentication. In your case might be it is happening the same but in your case, 2 tabs are opening. you can use driver.window_handles to identify the target tab

Let me know if it is helpful.

thanks, I gave the code a try but it's still redirecting me to the login page. I would just assume that I've been block for a while, but when I analyzed the same behavior using chrome as a browser (outside python and selenium), opening the URL opened two windows, so I thought there was a way to switch to one or the other and bypass the login page — Gustavo Moreno, Aug 01 '20 at 02:51
Yes, you can switch tab using selenium - Please refer below url for more information on that. https://stackoverflow.com/questions/9588827/how-to-switch-to-the-new-browser-window-which-opens-after-click-on-the-button — Swaroop Humane, Aug 01 '20 at 02:55

Accesing a URL opens 2 pages (in two tabs), I want to be able to select one of the two

1 Answers1