0

https://www.google.co.in/search?q=black+sabbath+%E2%80%93+iron+man&oq=black+sabbath+%E2%80%93+iron+man&aqs=chrome..69i57.461j0j4&sourceid=chrome&es_sm=122&ie=UTF-8

In the link provided above, the very first result is the video link to the youtube, I want to access the link provided. How can I do that in python?

EDIT: My input will be a string that I query in the google-search box. Like in this case "black sabbath iron man"

Prasanna Kumar
  • 69
  • 4
  • 11

1 Answers1

3

Scraping HTML is fragile -- yes you can do it with beautifulsoup4, e.g

import bs4
soup = bs4.BeautifulSoup(html_string)
href = soup.find('h3').find('a').get('href')
print(href)

will show /url?q=http://www.youtube.com/watch%3Fv%3D9LjbMVXj0F8&sa=U&ei=ESCPVPD6NcT3yQS-04C4DA&ved=0CBQQtwIwAA&usg=AFQjCNGV1u7FshGW4K_Ffu0zLzwaW7sCzw or the like. However, the slightest cosmetic change to Youtube search results might break your application.

Better to register your app with Google and use the provided API, as per Google's own docs. The Python client library nicely supports App Engine, see https://developers.google.com/youtube/v3/code_samples/python_appengine for example.

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • 2
    BeautifulSoup's CSS support makes this a little more readable and robust with `soup.select('h3 a[href]')[0]['href']`. Not sure if the user was after just YouTube videos or Google search results though, there is no good API for the latter, right? (And welcome back to answering! :-)) – Martijn Pieters Dec 16 '14 at 00:42
  • Hi Martijn, yes you can scrape a little more robustly, but it will still be very fragile. You're right that the Google Search API was turned off recently after 3+ years of deprecation -- and scraping search results violates the TOS, as mentioned at http://stackoverflow.com/questions/22657548/is-it-ok-to-scrape-data-from-google-results . – Alex Martelli Dec 16 '14 at 01:19