0

I'm an beginner learning web scraping with Selenium. Recently I faced the problem that sometimes there are button elements that do not have a "href" attribute with link to the website it leads to. In order to obtain the link or useful information from that link, I need to click on the button and get the current url in the new window using the "current_url" method. However, it doesn't always work, when the new url is not valid. I'm asking for help on the solution.

To give you an example, say one wants to obtain the Spotify link to the song listed on https://www.what-song.com/Tvshow/100242/BoJack-Horseman/e/116712. After clicking on the Spotify button, instead of being directed to spotify web player, I see a new window popping up with this url "spotify:track:6ta5yavnnEfCE4faU0jebM". It's not valid probably due to some errors made by the website, but the identifier "6ta5yavnnEfCE4faU0jebM" is still useful so I want to obtain it. However, when I try using the "current_url" method, it gives me the original link "https://www.what-song.com/Tvshow/100242/BoJack-Horseman/e/116712", instead of the invalid url. My codes are attached below. Note that I already have a time.sleep. Specs: MacOS 12.6, chrome and webdriver version 106.something, Python 3.

s = Service('/web_scraping/chromedriver')
driver = webdriver.Chrome(service=s)
wait = WebDriverWait(driver, 3)
driver.get('https://www.what-song.com/Tvshow/100242/BoJack-Horseman/e/116712')
spotify_button_element = driver.find_element("xpath",'/html/body/div/div[2]/main/div[2]/div/div[1]/div[5]/div[1]/div[2]/div/div/div[2]/div/div[1]/button[3]')
driver.execute_script("arguments[0].click();", spotify_button_element)
time.sleep(3)
print(driver.current_url)

Any idea on why this happened and how to fix it? Hugh thanks in advance!

  • because the current_url is focused on the tab that you originally had you can get the other tabs URL with https://stackoverflow.com/questions/46416852/get-urls-of-all-open-tabs-using-python – Andrew Ryan Oct 25 '22 at 15:06
  • Hi Andrew, thanks for the comment. I doubt if it's my case though. First, after I click on the button, I'm directed to the new tab that pops up with the invalid url, instead of staying in the previous page (you can try it with the code and see). Second, the method proposed doesn't work because once I run the switch handle code the invalide url turns into "about:blank"... – iim7b5-v7-im7 Oct 25 '22 at 15:29

1 Answers1

0

What you could do instead of finding the button to click and opening a new tab is to do the following:

import json

spotify_data_request = driver.find_element("id",'__NEXT_DATA__') # get the data stored in a script tag with id = '__NEXT_DATA__'
temp = json.loads(spotify_data_request.get_attribute('innerHTML')) # convert the string into a dict like object
print(temp['props']['pageProps']['episode']['songs'][0]['song']['spotifyId']) # get the Id attribute that you want instead of having to click the spotify button and retrieve it from the URL
Andrew Ryan
  • 1,489
  • 3
  • 15
  • 21
  • Thank you so much Andrew! Although I still need to learn a lot more about the basic concepts in javascript and html to thouroughly figure out why this works by myself, this is clearly the solution I'm looking for. Unfortunately I don't have enough reputation to give you a upvote so I comment here. – iim7b5-v7-im7 Oct 25 '22 at 20:23