0

I'm attempting to scrape URLs based on google searches. However, the URL I am scraping has an arrow in it and seems to be a Google redirect page:

'uk.linkedin.com › pauljgarner'

What I want is the direct link:

https://www.linkedin.com/in/pauljgarner/?originalSubdomain=uk

Here is my code. I am unsure of how to modify it to get the direct link. I would greatly appreciate your help.

from selenium import webdriver
driver = webdriver.Chrome('/Users/yu/Downloads/chromedriver')
driver.get('https:www.google.com')

##inputting google search##
search_query = driver.find_element_by_name('q')
search_query.send_keys(parameters.search_query)
search_query.send_keys(Keys.RETURN)

linkedin_urls = driver.find_elements_by_xpath(".//div[@class='TbwUpd NJjxre']")
linkedin_urls = [url.text for url in linkedin_urls]

for linkedin_url in linkedin_urls:
    driver.get(linkedin_url) 
    ##getting an error on this line (likely because the url is a redirect)

linkedin_url

I've seen a few posts about transforming the redirect link into a direct link via the use of add-ons (Grease Monkey), but haven't figured how to use them. Would prefer an answer that I could modify the code with (if possible). Thanks

Yus Ra
  • 97
  • 1
  • 6

1 Answers1

0

After finding the appropriate elements, why don't you use get_attribute() to find the href link. You could just use if you just want the first URL:

linkedin_urls=driver.find_elements_by_xpath('//*[@id="rso"]/div[1]/div/div/a')
linkedin_urls=[url.get_attribute('href') for url in linkedin_urls]

if you want all the URLS:

linkedin_urls=driver.find_elements_by_xpath('//*[@id="rso"]/div/div/div/a')
linkedin_urls=[url.get_attribute('href') for url in linkedin_urls]

get_attribute() in Python

chaha0s
  • 118
  • 7
  • This code worked for me a month ago, but I ran it again, and received the following error: ```Message: no such element: Unable to locate element: {"method":"xpath","selector":"div/div/div[1]/a"}``` Any idea why this might be? – Yus Ra Apr 02 '20 at 04:22
  • I suspect that I'm not fully understanding where these xpaths are coming from. It is clear to me where the xpath from the first line is coming from, but I am unsure about the second line – Yus Ra Apr 03 '20 at 02:38