0

I'm using the python requests library to make a query from google. But it's not working. Before I posted here I found another post on stackoverflow here but it didn't work either. I thinksomething has changed with the way you can make queries in the url using google but I'm just so new I don't tknow what it is. Heres my code

def index(request):

        url = ('https://www.google.com/webhp?hl=en#hl=en&q=stackoverflow')
        google = requests.get(url)
        bs = BeautifulSoup(google.content)
        d = bs.title.string


        links = []
        for link in bs.findAll('a'):
            links.append((
                link.text,
                link.get('href'),
                # link.get('src')
            )
        )


        # return HttpResponse('<pre>' + r.text + '</pre>')
        context = {

            "links": links,
        }
        return render(request, 'index.html', context)

and in my template

 {% for l in links %}
   {{l}}<br>
 {% endfor %}

this is the output

    ('https://maps.google.com/maps?hl=en&tab=wl',)
    ('https://play.google.com/?hl=en&tab=w8',)
    ('https://www.youtube.com/?hl=en&tab=w1',)
    ('https://news.google.com/nwshp?hl=en&tab=wn',)
    ('https://mail.google.com/mail/?tab=wm',)
    ('https://drive.google.com/?tab=wo',)
    ('https://www.google.com/intl/en/options/',)
    ('http://www.google.com/history/optout?hl=en',)
    ('/preferences?hl=en',)
    ('https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.com/webhp%3Fhl%3Den',)
    ('/search?site=webhp&ie=UTF-8&q=Jane+Jacobs&oi=ddle&ct=jane-jacobss-100th-birthday-5122456077467648-hp&hl=en&sa=X&ved=0ahUKEwjinsHMgMHMAhVKPz4KHVX_CLsQNggD',)
    ('/advanced_search?hl=en&authuser=0',)
    ('/language_tools?hl=en&authuser=0',)
    ('/intl/en/ads/',)
    ('/services/',)
    ('https://plus.google.com/116899029375914044550',)
    ('/intl/en/about.html',)
    ('/intl/en/policies/privacy/',)
    ('/intl/en/policies/terms/',)

this seems to be the google homepage, but it does not match what I queried. i should be getting a list of articles having to do with stackoverflow. How can I fix this? To be clear I want to query google with a query of my choice and scrape the links and display them on my template

Community
  • 1
  • 1
nothingness
  • 971
  • 3
  • 10
  • 18
  • 1
    Google is pretty protective of their search results, they are actively detecting programmatic access and preventing it (if they get suspicious they will start showing you captchas, and blocklist your ip eventually). If you really want to go that route you probably would need to have more powerful http client that can follow redirects, maintain cookies, run js potentially, so it is a full featured browser emulation. – serg May 04 '16 at 18:28

1 Answers1

0

1) Have you looked at the source of the page you're trying to scrape? I don't see the links in the generated html.
2) You probably have to use Selenium or something similar. For one, you're not defining a user agent. Google has designed its page to thwart these sort of efforts