newbie programmer and lurker here, hoping for some sensible advice. :)
Using a combination of Python, BeautifulSoup, and the Bing API, I was able to find what I wanted with the following code:
import urllib2
from BeautifulSoup import BeautifulStoneSoup
Appid = #My Appid
query = #My query
soup = BeautifulStoneSoup(urllib2.urlopen("http://api.search.live.net/xml.aspx?Appid=" + Appid + "&query=" + query + "&sources=web"))
totalResults = soup.find('web:total').text
So I'd like to do this across a few thousand search terms and was wondering if
- doing this request a thousand times would be construed as hammering the server,
- what steps I should take to not hammer said servers (what are best practices?), and
- is there a cheaper (data) way to do this using any of the major search engine APIs?
It just seems unnecessarily expensive to grab all that data just to grab one number per keyword and I was wondering if I missed anything.
FWIW, I did some homework and tried the Google Search API (deprecated) and Yahoo's BOSS API (soon to be deprecated and replaced with a paid service) before settling with the Bing API. I understand direct scraping of a page is considered poor form so I'll pass on scraping search engines directly.