0

I am trying to get results of pizza from the Bing Search, here is my code:

import requests

search = input("Search for:")
Params = {"q": search}
r = requests.get("http://www.bing.com/search", params=Params)
f = open("test.html", "w+", encoding='utf-8')
print(r.url)
print(r.status_code)
f.write(r.text)

It runs without any error

Search for:pizza
http://www.bing.com/search?q=pizza
200

Process finished with exit code 0

But when I open the file in which the program saved the text I see no results

Here is the picture of html file:

Here is the Pic of html file

Dmitriy Zub
  • 1,398
  • 8
  • 35
  • https://stackoverflow.com/questions/45448994/wait-page-to-load-before-getting-data-with-requests-get-in-python-3 – Gareth Ma Apr 01 '20 at 14:52
  • What you saved (and what beautifulsoup sees) is the bare-bones html template that the server serves to you when you make this request. Normally, if a browser makes a request to this html resource, it renders this html template in the browser and then the DOM is populated with the actual results using scripts. Beautifulsoup only looks at the html. – Paul M. Apr 01 '20 at 15:04
  • 1
    @GarethMa Hey again and gratz on the 500+ rep – Chris Doyle Apr 01 '20 at 15:17
  • @ChrisDoyle lmaO hey :) got +50+25 reps from a single question haha – Gareth Ma Apr 01 '20 at 15:17

2 Answers2

1

While the comments in the question are correct, they dont apply in this case. the simple check for JS loaded events that modifies the dom is to just turn off javascript and reload the page in your browser. If that produces the same as you got from requests then you know the content is being populate by a dom event after the page source and requests wont see any dom changes made to the content.

However in this case if i load the search url in the browser with JS disabled i still see a whole host of results. Yes a few elements are missing. But it doesnt explain why in your code you dont see any of the search results at all.

so the other thing to consider is the fact that when we make webrequests as part of the http headers we send a user-agent string that identifies what type of http client we are. Now some sites may filter or block certain user agents. or only provide results to ones that are real web browsers. That seems to be the root of your issue.

If you update your code and set the http headers with a user agent string like a web broswer you should see you get a whole host of results back and an html that closly matches the same as you would see in a browser. Minus any elements in the broswer that are loaded post source code via JS based events.

import requests

search = input("Search for:")
Params = {"q": search}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
r = requests.get("http://www.bing.com/search", params=Params, headers=headers)
f = open("test.html", "w+", encoding='utf-8')
print(r.url)
print(r.status_code)
f.write(r.text)

Now while that does work and will give you more of what you want the question is now that of a moral one. Bing have chosen to not return the results to your requests.get call. you can of course lie and tell bing your not coming from requests but from a web browser. Or you could possible respect their choice that they dont want their service accessed like this.

If you are interested specifically in the search results bing does provide a search API which responds in JSON format and some people have already written python modules to utilise it.

https://pypi.org/search/?q=bing+api&o=

Chris Doyle
  • 10,703
  • 2
  • 23
  • 42
0

The possible problem as already mentioned by Chris Doyle is that If no user-agent is being passed into request headers to act as a "real" user visit while using requests library, it defaults to python-requests, so Bing or other search engines (websites) understands that it's a bot/script, and might block a request (or do something different to prevent from scraping the website). Check what's your user-agent. List of user-agents.


Alternatively, you can achieve this by using Bing Organic Results API from SerpApi. It's a paid API with a free plan.

The difference in your case is that you don't have to spend time thinking about the things you don't want to think about. Or bypass blocks from Bing or other search engines. Instead, focus on the data that needs to be extracted from the structured JSON. Check out the playground.

Example code to integrate:

from serpapi import GoogleSearch

params = {
  "api_key": "YOUR_API_KEY",
  "engine": "bing",
  "q": "buy a rocket"
}

search = GoogleSearch(params)
results = search.get_dict()

for result in results["organic_results"]:
    print(result["title"], sep="\n") 

    # prints all titles from the first page

Disclaimer, I work for SerpApi.

Dmitriy Zub
  • 1,398
  • 8
  • 35