I was trying to scrape the data from a web page using Java Servlet, but I found out that the page is compressed. So when I make a URLConnection, it invokes to download the zipped file.
Can anyone help me with this? Actually, I would be visiting 1000s of pages like these, parse the table data using DOM and populate the database to make a query for some of the text words, and display the results. So I was wondering if this could make the process too slow.
Is there a way to do this without downloading the file? Any suggestions would be greatly appreciated. Thanks.
try{
URL url = new URL("example.html.gz");
URLConnection conn = url.openConnection();
//FileInputStream instream= new FileInputStream(???What do I enter???);
//GZIPInputStream ginstream =new GZIPInputStream(instream);
conn.setAllowUserInteraction(false);
InputStream urlStream = url.openStream();
BufferedReader buffer = new BufferedReader(new InputStreamReader(urlStream));
String t = buffer.readLine();
while(t!=null){
temp = temp + t ;
t = buffer.readLine();
}