I am trying to retrieve the html of a Google search query result using Java. That is, if I do a search in Google.com for a particular phrase, I would like to retrieve the html of the resulting web page (the page containing the links to possible matches along with their descriptions, URLs, ect…).
I tried doing this using the following code that I found in a related post:
import java.io.*;
import java.net.*;
import java.util.*;
public class Main {
public static void main (String args[]) {
URL url;
InputStream is = null;
DataInputStream dis;
String line;
try {
url = new URL("https://www.google.com/#hl=en&output=search&sclient=psy-ab&q=UCF&oq=UCF&aq=f&aqi=g4&aql=&gs_l=hp.3..0l4.1066.1471.0.1862.3.3.0.0.0.0.382.1028.2-1j2.3.0...0.0.OxbV2LOXcaY&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&fp=579625c09319dd01&biw=944&bih=951");
is = url.openStream(); // throws an IOException
dis = new DataInputStream(new BufferedInputStream(is));
while ((line = dis.readLine()) != null) {
System.out.println(line);
}
} catch (MalformedURLException mue) {
mue.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
is.close();
} catch (IOException ioe ) {
// nothing to see here
}
}
}
}
From: How do you Programmatically Download a Webpage in Java
The URL used in this code was obtained by doing a Google search query from the Google homepage. For some reason I do not understand, if I write the phrase that I want to search for in the URL bar of my web browser and then use the URL of the resulting search result page in the code I get a 403 error.
This code, however, did not return the html of the search query result page. Instead, it returned the source code of the Google homepage.
After doing further research I noticed that if you view the source code of a Google search query result (by right clicking on the background of the search result page and selecting “View page source”) and compare it with the source code of the Google homepage, they are both identical.
If instead of viewing the source code of the search result page I save the html of the search result page (by pressing ctrl+s), I can get the html that I am looking for.
Is there a way to retrieve the html of the search result page using Java?
Thank you!