0

I am using eclipse ganymede and trying to create a servlet which takes a query from the user and outputs the google search results of that query. I would like to parse the response which I get from google but at the moment, I am not even able to get the response.

I know this might be because of the way Google accept requests, so is there any way I can achieve this. I would like to avoid using the Google Custom Search API, as it has it's own complications, but if there is no other way, please let me know.

EDIT: Bing Search is working after setting up proxy but no luck with Google search, is it because of https?

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet; 
import javax.servlet.http.HttpServletRequest; 
import javax.servlet.http.HttpServletResponse; 
import java.io.PrintWriter;

import java.net.InetSocketAddress;
import java.net.Proxy;
import java.net.URL;
import java.net.URLConnection;

public class HelloWorld extends HttpServlet {
    protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { 
        // reading the user input
        String query = request.getParameter("query");
        PrintWriter out = response.getWriter(); 
        Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress("proxy.address", 8080));



        URL urldemo = new URL("https://www.google.co.in/search?q="+query);
        //urldemo = new URL("http://www.bing.com/search?q="+query);

        URLConnection yc = urldemo.openConnection(proxy);
        BufferedReader in = new BufferedReader(new InputStreamReader(
                yc.getInputStream()));
        String inputLine;
        while ((inputLine = in.readLine()) != null)
        {

            out.println(inputLine);
            System.out.println(inputLine);

        }


        in.close();
    } 
}

Stack Trace:

java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.google.co.in/search?q=easd
          sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1625)
          sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
    HelloWorld.doGet(HelloWorld.java:30)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:690)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
kira_111
  • 95
  • 8
  • I am assuming that your server might not have a connection to the internet and it is failing to serve the request. Even if the parameter "query" does not exist, the URL `https://www.google.co.in/#q=null` will still work. – hfontanez Mar 24 '15 at 04:04
  • I am working behind a proxy, could that cause an issue, if so then how should I fix it? – kira_111 Mar 24 '15 at 04:22
  • Thanks a lot @hfontanez . I added the proxy part to the code and it is working for BING but not for Google, what am I missing? – kira_111 Mar 24 '15 at 05:26
  • 1
    It is most likely because of your HTTPS connection. That said, I'm sure Google's T&C's also frown upon non human generated search traffic so after a few queries you'll probably be blocked anyway as you'll not be able to complete the 'captcha'. – radimpe Mar 24 '15 at 05:35
  • I would try just `HTTP`, rather than `HTTPS` as radimpe suggested. – hfontanez Mar 24 '15 at 10:55
  • HTTP is not working either. – kira_111 Mar 24 '15 at 17:18

1 Answers1

0

I myself is doing similar things in my code. The situation is, you can access to this url in browswer yet not program. This has nothing to do with https, but should be related to Google's restriction on automated query sending. See thread How to send automated query to Google. Note the highest voted answer and the one following it.

The true cause of 403 is missing of User Agent. Indeed you can use Google Custom Search API like How to get and use API key and server key But the limitation is of course the no. of queries per day.

The other way round is to simulate user agent, which I'm trying myself...

Community
  • 1
  • 1
OrlandoL
  • 898
  • 2
  • 12
  • 32