345

Google Web Search API has been deprecated and replaced with Custom Search API (see http://code.google.com/apis/websearch/).

I wanted to search the whole web but it looks like with the new API only custom sites can be searched.

Is there a way to search the whole web programmatically? I was able to query the old API using JSON from a Java program.

i alarmed alien
  • 9,412
  • 3
  • 27
  • 40
Dan
  • 9,681
  • 14
  • 55
  • 70
  • I've been using an alternative [google search api](https://goog.io). It is super easy to use. –  Jun 13 '20 at 16:21
  • 3
    There is also [SerpApi](https://serpapi.com). That's a solid solution for Google search and other engines. – Hartator Jun 09 '21 at 16:00
  • 1
    https://www.serphouse.com is a great solution to get data using API. – Mehul V. Jun 17 '22 at 09:16

10 Answers10

501

Yes, Google Custom Search has now replaced the old Search API, but you can still use Google Custom Search to search the entire web, although the steps are not obvious from the Custom Search setup.

To create a Google Custom Search engine that searches the entire web:

  1. From the Google Custom Search homepage ( http://www.google.com/cse/ ), click Create a Custom Search Engine.
  2. Type a name and description for your search engine.
  3. Under Define your search engine, in the Sites to Search box, enter at least one valid URL (For now, just put www.anyurl.com to get past this screen. More on this later ).
  4. Select the CSE edition you want and accept the Terms of Service, then click Next. Select the layout option you want, and then click Next.
  5. Click any of the links under the Next steps section to navigate to your Control panel.
  6. In the left-hand menu, under Control Panel, click Basics.
  7. In the Search Preferences section, select Search the entire web but emphasize included sites.
  8. Click Save Changes.
  9. In the left-hand menu, under Control Panel, click Sites.
  10. Delete the site you entered during the initial setup process.

Now your custom search engine will search the entire web.

Pricing

  • Google Custom Search gives you 100 queries per day for free.
  • After that you pay $5 per 1000 queries.
  • There is a maximum of 10,000 queries per day.

Source: https://developers.google.com/custom-search/json-api/v1/overview#Pricing


  • The search quality is much lower than normal Google search (no synonyms, "intelligence" etc.)
  • It seems that Google is even planning to shut down this service completely.
Bangkokian
  • 6,548
  • 3
  • 19
  • 26
  • 27
    Thanks for this. Hopefully this is a valid procedure and not a loophole waiting to be plugged by Google! – Mazatec Jul 07 '12 at 14:02
  • 2
    Confirmed to be working. Results are slightly different than a live search though. Any ideas on that? Bing's API has the same problem. – Farzher Nov 26 '12 at 19:08
  • 9
    Thank you! This is possibly the only answer on the Internet that addressed my question. It's mind boggling why Google would end direct API support for their core service. – spamguy Jan 24 '13 at 04:16
  • 7
    but how to use it with json ? – nawara Apr 19 '13 at 19:32
  • 2
    The results are a little different because of personalized and local search results. – Praesagus May 15 '13 at 21:13
  • 1
    welll thats great, but the thing I hesistate at is, IS IT PAID ?? – Deepanshu Goyal Dec 06 '13 at 07:59
  • @Deepanshu You only get 100 queries per day for free ([docs](https://developers.google.com/custom-search/json-api/v1/overview#pricing)). – Rob W Dec 17 '13 at 17:50
  • 2
    This is why Google claims that the search results are different https://support.google.com/customsearch/answer/141877?hl=en Mainly: Using specified sites (does not apply here), no social or personalized or real time results – MFARID Apr 27 '14 at 17:41
  • Any chance you can update this question to reflect the new layout? Can't seem to find half the stuff in your question. – Rippo Jun 30 '14 at 14:21
  • Rippo -- I haven't been back in a while... but even if they've changed the layout the methodology is probably still sound: Create a search engine to search a specific site PLUS the entire web. Then delete that specific site. What you're left with should be a generic web search. They may have closed the loophole afterall... but if it's still do-able, this general advice may help. Good luck. – Bangkokian Jun 30 '14 at 15:29
  • 2
    And.. if they have closed the loophole and now force you to search at least 'one' site. You might try creating a URL/site with zero content. Just a blank index.html page. The results should then be the same as a generic web search. 'Just a thought... – Bangkokian Jun 30 '14 at 15:35
  • I tried it but it doesn't work now. I asked to look in the entire web for suunto ambit watch, but I got no results (I searched in the public URL that I got) – Dejell Feb 11 '15 at 20:09
  • 2
    Note this only works for the free version https://support.google.com/customsearch/answer/2631040 –  Mar 05 '15 at 22:53
  • 3
    @MFARID It does not only miss social/live/etc data. It does not allow a search based on synonyms and it is completely missing intelligence. e.g. "john doe northpole" will not return a result if "john doe" is now living at the "southpole" and has changed this information on his website or removed the word "northpole" or he or you made a typo like "nortpole". In my eyes the custom search is nearly useless. – mgutt Mar 24 '15 at 23:16
  • 70
    WARNING: we did development using the free version, but to upgrade to the paid version (to do more than 100 searches), google forces you to turn off the "search the entire web but emphasize included sites" – Bryan Larsen Aug 11 '15 at 14:50
  • @BryanLarsen, It's still possible to use the old API that doesn't have the paltry 100/day limit right? – Pacerier Sep 18 '15 at 04:21
  • @Bangkokian, Why is there a hard limit of 10k queries/day? Assuming you can pay, How do you get above 10k queries/day then? Do you create multiple keys? – Pacerier Sep 18 '15 at 04:22
  • I'm not sure how it was before, but now you have to set up a billing account regardless of whether you use the free or paid tier. Bummer. – thdoan May 13 '16 at 03:56
  • entireweb.com has discontinued the service as seen here http://www.entireweb.com/services – Wessam El Mahdy Jul 24 '16 at 16:43
  • This still works. – Jake 1986 Dec 01 '16 at 21:22
  • 5
    "On April 1, 2017, Google will discontinue sales of the Google Site Search. All new purchases and renewals must take place before this date. The product will be completely shut down by April 1, 2018." – Gajus Mar 07 '17 at 15:52
  • Google custom search for the entire web works, but it won't give you more than 100 results per search query *even if you are a paying customer*. – nurettin Jul 24 '17 at 10:54
  • The Google Custom Search homepage ( http://www.google.com/cse/ ) always returns 500 err... Is anyone facing the same problem? – Tina Lee Jul 25 '17 at 04:34
  • It's worth adding that besides such a low limit it also permits only 10 results per query – Oleg Kuralenko Nov 01 '17 at 20:01
  • After we create the custom search engine, how do we invoke the API ? – Nathan B Oct 10 '19 at 12:47
  • @TinaLee the correct URL is https://cse.google.com/cse/ – Justin Skiles Feb 18 '20 at 16:09
  • There exists a third-party API called [SerpApi](https://serpapi.com/). It has a [Google Search Engine API](https://serpapi.com/search-api) which returns a raw JSON. It has a free plan of 100 searches/month (to test out). There're plans of $50/$130/$250/enterprise for 5.000/15.000/30.000/100.000+ searches per month accordingly, with a throughput of 20% of plan searches per hour. It has been regularly updated (for new Google layouts) for the last 5 years, and [has 10 API wrappers](https://serpapi.com/integrations). [Playground to mess around](https://serpapi.com/playground) – Dmitriy Zub Oct 31 '22 at 09:09
59

Google Custom Search (as advocated in the top rated answers) works well, but is very expensive, compared to its competitors (below) or compared to other Google API's. It has a small free tier (100 queries/day) and a very high price of $5 per 1000 query.

They offer the option to upgrade to Site Search, which has slightly better prices, but that is meant for searching one site (your own), so it is really something quite different - not an upgrade.

The main alternatives seem to be:

Bing Search API
https://datamarket.azure.com/dataset/5BA839F1-12CE-4CCE-BF57-A49D98D29A44
Which has a free tier of 5000q/month, and prices starting at 5 query per penny, and no hard limit.

UPDATE: At the end of 2016 this API was shutdown in favour of its Azure counterpart "Cognitive Services Bing Search API":
https://azure.microsoft.com/en-us/services/cognitive-services/search/

See here for a pricing chart, which starts at US$3/m for 1,000 transactions. Unless I'm missing something it is quite expensive.

Yahoo BOSS Search API
UPDATE: Was discontinued on March 31, 2016. http://developer.yahoo.com/boss/search/
With prices starting at about 12 queries/penny for whole web searches.

And some I haven't heard of before:

http://www.gigablast.com/searchfeed.html

http://www.faroo.com/hp/api/api.html

http://www.commoncrawl.org/

http://www.entireweb.com/search_api/implementation/
[discontinued - as pointed out below]

There is a bit of discussion of some of these on this SO post.
[got closed for being off-topic and is now gone]

Community
  • 1
  • 1
Tom
  • 17,103
  • 8
  • 67
  • 75
  • 2
    Bing Search API version 5 now allows up to 1,000 transactions per month across all Bing Search APIs (Web, Images, Video, News Search) - https://www.microsoft.com/cognitive-services/en-us/pricing . I put together some samples - http://mvark.blogspot.in/2016/06/how-to-use-bing-search-v5-api-with.html – mvark Jun 21 '16 at 04:42
  • entireweb.com has discontinued the service as seen here http://www.entireweb.com/services – Wessam El Mahdy Jul 24 '16 at 16:42
  • 1
    on Dec 15, 2016 Bing Web Search API will move under Cognitive Services by Azure Marketplace (https://azure.microsoft.com/en-us/services/cognitive-services/search/), which require a phone + credit card verification for a subscription (even a free one). – gilad905 Nov 08 '16 at 10:33
  • From Bing API: "DataMarket and Data Services are being retired and will stop accepting new orders after 12/31/2016. Existing subscriptions will be retired and cancelled starting 3/31/2017. Please reach out to your service provider for options if you want to continue service." – Dmitri Zaitsev Apr 19 '17 at 13:40
  • Thanks for pointing out the change - I've updated answer accordingly. – Tom Apr 19 '17 at 14:24
  • 1
    Looks like Bing's moved their service again - now it's on the Azure Marketplace https://learn.microsoft.com/en-us/bing/search-apis/bing-web-search/create-bing-search-service-resource – Kyle Aug 26 '21 at 04:39
  • There's also a third-party API from [SerpApi](https://serpapi.com/) which has [Google](https://serpapi.com/search-api), [Bing](https://serpapi.com/bing-search-api), [Yahoo](https://serpapi.com/yahoo-search-api) and 20+ more search engine APIs. It's also a paid API (with a free plan) but maintained/updated on regular basis for the last 5 years. [Roadmap](https://github.com/serpapi/public-roadmap/issues). [Playground](https://serpapi.com/playground) – Dmitriy Zub Oct 31 '22 at 08:59
41

You could just send them through like a browser does, and then parse the html, that is what I have always done, even for things like Youtube.

Zimm3r
  • 3,369
  • 5
  • 35
  • 53
  • I really need a proper API call as I'm intending making many calls. – Dan Nov 02 '10 at 23:24
  • 66
    i'm told that googles terms of service forbid spidering... – Steven A. Lowe Nov 21 '10 at 05:06
  • 56
    From the [TOS](http://www.google.com/accounts/TOS): "You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers)..." – ændrük Mar 06 '11 at 17:53
  • 1
    Shabby, wouldn't on any large scale. Maybe if the program is for personal use... – Peter Kazazes Jan 05 '12 at 15:23
  • 13
    @Zimm3r Read the tooltip on the "downvote" button; that's why. Also, because the suggestion isn't allowed by google's TOS. – WhyNotHugo Jul 23 '12 at 19:41
  • 4
    @Hugo the answer is useful because it does what was asked and I AM STILL getting downvoted for an answer that was accepted, that works, that is useful, and it is the askser's responsibility to decide on google TOS not mine. – Zimm3r Dec 16 '12 at 20:54
  • 5
    @Zimm3r If the answer is useful or not is subjective. I did not find it useful having the same question as the op, since it's neither a clean solution, or something that the TOS allows. – WhyNotHugo Dec 17 '12 at 00:50
  • 2
    @Hugo no it isn't subjective or at least not in such a degree you suggest, it is useful if it answers the question in a viable way, TOS violations are something to be weighted but not something that makes something wholly useless. – Zimm3r Dec 27 '12 at 07:55
  • 19
    "Violate the terms of service with a service provider" is never a good advice. Parsing webpages is something that breaks from one day to the next without warning, this is awful advice - that's the reason it was downvotes more that it was upvoted. – WhyNotHugo Dec 27 '12 at 16:35
  • 4
    I don't recall telling them to break the TOS, I gave them a valid answer that was accepted as the best and it is their choice to do what they want with that information. – Zimm3r Dec 31 '12 at 05:21
  • 164
    Yes it breaks the terms of service but personally I wouldn't worry about that. Google can handle a little bit of scraping, after all they have made a fortune scraping other peoples sites. – jimbo2087 Feb 13 '13 at 00:29
  • 119
    Come on people. Don't be so naive. Google cannot force that ToS down your throat. In order to violate a ToS you must first agree with it (in writing, or by clicking a button like 'Yes, I accept the terms'). Think at this: I put a ToS on my web page that every person that visits that page has to give me $10000. Can I enforce this ToS on my visitors? Will the have to may me immediately. – Gabriel Nov 25 '13 at 22:13
  • 12
    @Altar they can still block your IP ;) Ever seen a captcha in Google search? Some people have. – WGH Feb 02 '14 at 19:14
  • @Altar This's simply untrue. If your program is running on dedicated server, it certainly has a static IP. Besides, having dynamic address still means that you have to reconnect manually to obtain a new one. – WGH Feb 03 '14 at 22:39
  • 1
    @Altar Just saying "come on" doesn't magically dispel all barriers. You have to stay within the limits of the law. – Asad Saeeduddin Jun 04 '14 at 13:42
  • @WGH-most router today have an option to retrieve a new IP at midnight. – Gabriel Jun 05 '14 at 14:30
  • @Altar You are right, but finally you could infringe a law like https://en.wikipedia.org/wiki/Sui_generis_database_right or in Germany like http://dejure.org/gesetze/StGB/303b.html So it depends on your country and laws and of course it depends on the laws of the country where google is located at. But finally its much easier for google to ban ips. And of course you could reconnect and obtain a new ip as often as you want, but it could be possible that google uses geo databases to block your region much more (e.g. if you search 10x times in 5 minutes) often than others. – mgutt Mar 24 '15 at 23:35
  • 4
    No, you can't enforce a ToS against random web surfers. However, creating a program to scrape a web page shows clear intent and the skill required to do so would put you in a higher class of "reasonable person". You might not lose a criminal lawsuit but probably would lose a civil lawsuit. IANAL. Ref: Aaron Swartz. – Bryan Larsen Jun 02 '15 at 19:52
  • 4
    -1 @Zimm3r, you said you provided a "valid answer", but I disagree. I don't consider it a valid answer when it requires the use of a web service, while specifically breaking their T.O.S. Your solution cannot be used without violating Google's Terms of Use, therefore is not really a valid answer in my opinion. It's like someone telling you they need money for groceries, and you suggesting they rob the bank. Sure, technically it is an option, but not one that is likely to work. – Sherwin Flight Sep 28 '15 at 05:16
  • The usefulness of the answer does not mean 'always applicable'. The Google Terms of Service could change - they have already after all. If you need a small amount of files, you are not hurting big G. – Titou Apr 06 '17 at 13:49
  • 3
    Scraping the webpage has these disadvantages: (1) Google doesn't like it - you might face IP ban, captchas and other obstacles. (2) The HTML code of the webpage changes frequently - you will end up fixing your code again and again in your long-term projects. (3) The API can possibly give you more metadata about the search results than the webpage. I downvoted this answer. But I'm not any kind of law nazi. This approach is simply not good for the reasons above. – Jeyekomon Jan 29 '18 at 13:46
  • 1
    @ændrük that part about **automated means** is gone from their TOS since March 2012. – rustyx Mar 14 '18 at 18:51
  • @rustyx it still break the terms: "don’t interfere with our Services or **try to access them using a method other than the interface and the instructions that we provide.**" – Andre Figueiredo Jan 21 '19 at 18:55
  • 2
    BTW: The reason why Google is so adamant about preventing scraping is not for the reasons you think: It is not because it might cost bandwidth—which is cheap. It is because one of Google's most valuable assets, is that its query log is one of the most potent insights into the collective consciousness. Being polluted by mechanized queries would make it worthless, so they are investing all their efforts to dithering scraping done in a way to pollute that data set. – Jérémie Mar 15 '19 at 22:34
  • @AndreFigueiredo `don't [...] access [our Services] using a method other than the interface and the instructions that we provide` => a web crawler **is using** the interface and the instructions that they provide. It just does so by automated means instead of manually, so a web crawler is absolutely compliant with these ToS (at least, with this sentence you quoted). – Jivan Jul 11 '20 at 21:18
  • @Jivan that's a fair point, I'm not savvy about laws and web crawlers, but my guess is that bots and raw HTTP requests would not be compliant to an accepted interface, versus Selenium for example :P. And, instructions they provide to access their services would not include automated requests - scrapping. Correct me if I'm wrong. – Andre Figueiredo Jul 12 '20 at 03:16
  • That said, they have changed their entire TOS, [new says](https://policies.google.com/terms#suspending-access): `we reasonably [??] believe that your conduct causes harm or liability [??] to a user, third party, or Google — for example, by [...] scraping content that doesn’t belong to you`. I honestly don't know what it exactly means to our case here.. We are doing no harm :P – Andre Figueiredo Jul 12 '20 at 03:18
  • @mopsyd while you are not compelled to "agree with" (whatever that means) the ToS, you _are_ compelled to comply insofar as Google as a private entity can choose not to provide service to you, and obviously they are likely to do so if you are violating their ToS. Further, they will be able to recoup damages in a civil setting. "Opting out" doesn't make sense; no one is forcing you to use their services. And declaring that they can "suck it" definitely doesn't do anything for you. – Ezekiel Victor Feb 07 '21 at 19:00
  • @jungle_mole Google is not using your services so your hypothetical terms to them don't matter. So they are not breaking your terms. And even if they somehow were, you still wouldn't be justified in breaking theirs; that's not how contracts work. It doesn't really matter anyway because you _are_ using their services in this case and you definitely have no particular right guaranteeing you access since as a private entity they have no obligation to serve you in the first place. – Ezekiel Victor Feb 07 '21 at 19:06
  • @Ezikiel being able and doing are entirely different concepts. If you want to take the pedantic stance you can say someone has a rule somewhere about the thing. You can also tak a practical stance that weighs the risk of a company retaliating or cutting off service, the likelihood that they care enough about a trivial infraction to waste time and money on an agressive civil action (they don’t, unless your abuse is egrarious), and decide whether or not tangental concerns likea ToS matters to your use case. I am certain that to one prone to pedantry and condecending emojis it probably does. – mopsyd Feb 07 '21 at 22:56
  • @EzekielVictor as a fact, they are using my services as "targetable ad-watcher" or human for of clicker bot. We have barter: from my side time and cognitive function, from their side -- search window on my desktop. But it's them, who setting rules.. Nope, I have my guesses too. Since they are closed for discussion, I'll just do my way and if they don't agree, they are free to refuse to continue acquiring my services, as you said: no obligations. Anyway, it's valid answer. If it was LawOverflow here, answer could be considered arguable.(sry i would be not justified from whose point of view) – jungle_mole Feb 10 '21 at 06:31
  • @jungle_mole, when you refer to "free to refuse to continue acquiring" your services, you're referring to being IP banned. This thread has jumped the shark. – Ezekiel Victor Mar 21 '21 at 10:04
  • @EzekielVictor yep, that's what i'm saying: any sanctions they see fit and able to impose. all the more so, they are not going astray from this "warpath" since forever, why succumb? when counterparty feels ok with their moral rights to take their advantage and utilize me, my hands are untied to make use of our reciprocity in any manner i see fit. they don't disclose their ways, neither they negotiate them, so why should i? every party seeks maximum benefit, but one with all its might tries to confine the other. preemptively, mind you. what's left is try and exploit the usurper, it won't starve – jungle_mole Mar 29 '21 at 23:53
26

Here is an option at the bottom of the Custom Search Control Panel: "Sites to search", you can choose "Search the entire web but emphasize included sites"

Custom Search Control Panel - Sites to search

Yishu Fang
  • 9,448
  • 21
  • 65
  • 102
14

Faroo has a free Web Search API

Jack
  • 3,913
  • 8
  • 41
  • 66
7

I have just come across this from Common Crawl.

http://www.commoncrawl.org/

Might be the answer we are all looking for!!

Sean Adkinson
  • 8,425
  • 3
  • 45
  • 64
Dan
  • 9,681
  • 14
  • 55
  • 70
  • 3
    It has a limited index, refreshed about once a year. And it is finally quite expensive, as you have to plug into Amazon S3. – Guillaume Lebourgeois Oct 23 '13 at 11:25
  • @GuillaumeLebourgeois, Expensive? I don't think that's true. It's a nonprofit. The entire 102 TB of data is free for download. – Pacerier Aug 23 '16 at 19:01
  • The cost is for connecting to AWS where you can access this. If you are a student, you are eligible for their free tier, but there could still be transfer costs etc; and if you are not in the free tier, there are running costs. – tripleee May 16 '18 at 09:07
  • 1
    Looks like common crawl is updated monthly now – Jack Mar 18 '19 at 19:56
  • At least currently (february 2022) the data can be downloaded from S3 for free. HTTP-links can be found on the commoncrawl website. – x-ray Feb 20 '22 at 11:48
4

There's a note on top of the docs:

Note: The Google Web Search API has been officially deprecated as of November 1, 2010. It will continue to work as per our deprecation policy, but the number of requests you may make per day will be limited. Therefore, we encourage you to move to the new Custom Search API.

The deprecation policy says that they will continue to run the API for 3 years. So if you already have an application that uses the old API, you don't have to rush to change things just yet. If you're writing a new application, use the Custom Search API. See my answer here for how to do this in Python, but the idea's the same for any language.

Community
  • 1
  • 1
moinudin
  • 134,091
  • 45
  • 190
  • 216
3

There's a free Java API called JFreeWebSearch which uses the already mentioned Faroo: http://www.ke.tu-darmstadt.de/resources/jfreewebsearch

Heiko
  • 41
  • 1
  • There's also a [`google-search-results-java`](https://github.com/serpapi/google-search-results-java) which is a [SerpApi](https://serpapi.com/) wrapper for Java. – Dmitriy Zub Oct 31 '22 at 08:48
1

Gigablast offers a cheap web search API: http://www.gigablast.com/searchfeed.html

jman
  • 392
  • 2
  • 7
0

You can create "everywhere" custom search engine right from the Google Custom Search homepage ( http://www.google.com/cse/ ). You should just click 'advanced', during adding new engine. There you can provide Schema.org site type. 'Thing' is most generic type, which covers all the web.

testlnord
  • 438
  • 1
  • 5
  • 14