4

Requirement:

To read HTML from any website say "http://www.twitter.com ".

Print the retrived HTML

Save it to a text file on local machine .

Code:

import java.net.*;

import java.io.*;

public class oddless {
    public static void main(String[] args) throws Exception {

        URL oracle = new URL("http://www.fetagracollege.org");
        BufferedReader in = new BufferedReader(new InputStreamReader(oracle.openStream()));

        OutputStream os = new FileOutputStream("/Users/Rohan/new_sourcee.txt");


        String inputLine;
        while ((inputLine = in.readLine()) != null)
            System.out.println(inputLine);
        in.close();
    }
}

Code above retrieves the data, prints it on console and saves it to a text file but mostly it retrieves only half code (because of line space in html code). It does not save the code further.

Questions:

How can I save the full html code?

Are there any other alternatives?

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • Don't c.ose the InputSteam until your finished reading it. Make sure you flush (if required) and close the OutputStream when you're done with it. All this should be done within a try-catch-finally block – MadProgrammer Mar 22 '14 at 07:32
  • Try Apache's Commons IO, they're great for copying entire streams and have been well tested. I've used the library in ~70% of my Android and JavaSE projects and it has worked great. You can find it here: http://commons.apache.org/proper/commons-io/ – lucian.pantelimon Mar 22 '14 at 07:40
  • @gstack Have you reviewed answers? – Leos Literak Mar 23 '14 at 10:19

3 Answers3

0

I have used different approach but I received same output like you. Is not there problem on server side of this URL?

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpGet = new HttpGet("http://www.fetagracollege.org");
CloseableHttpResponse response1 = httpclient.execute(httpGet);
try {
    System.out.println(response1.getStatusLine());
    HttpEntity entity1 = response1.getEntity();
    String content = EntityUtils.toString(entity1);
    System.out.println(content);
} finally {
    response1.close();
}

It finishes with:

    </table>
    <p><br>

UPDATE: This Faculty of Engineering and Technology does not have well formed home page. This content is complete, your code works well. But commentators have right, you shall use try/catch/finally block.

Leos Literak
  • 8,805
  • 19
  • 81
  • 156
0

I use this code whenever connecting to a website through Java

import java.io.*;
import java.net.*;

public class Connection
{
    public static void main(String[] args) throws Exception
    {
        new Connection();
    }
    public Connection() throws Exception
    {
        URL url = new URL("http://www.fetagracollege.org"); //The URL
        HttpURLConnection huc = connect(url); //Connects to the website
        huc.connect(); //Opens the connection
        String str = readBody(huc); //Reads the response
        huc.disconnect(); //Closes
        System.out.println(str); //Prints all output to the console
    }

    private String readBody(HttpURLConnection huc) throws Exception //Reads the response
    {
        InputStream is = huc.getInputStream(); //Inputstream
        BufferedReader rd = new BufferedReader(new InputStreamReader(is)); //BufferedReader
        String line;
        StringBuffer response = new StringBuffer();
        while ((line = rd.readLine()) != null)
        {
            response.append(line); //Append the line
            response.append('\n'); //and a new line
        }
        rd.close();
        return response.toString();
    }

    private HttpURLConnection connect(URL url) throws Exception //Connect to the URL
    {
        HttpURLConnection huc = (HttpURLConnection) url.openConnection(); //Opens connection to the website
        huc.setReadTimeout(15000); //Read timeout - 15 seconds
        huc.setConnectTimeout(15000); //Connecting timeout - 15 seconds
        huc.setUseCaches(false); //Don't use cache
        HttpURLConnection.setFollowRedirects(true); //Follow redirects if there are any
        huc.addRequestProperty("Host", "www.fetagracollege.org"); //www.fetagracollege.org is the host
        huc.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36"); //Chrome user agent
        return huc;
    }
}

The website ended with this, so I think the problem is server-side, as other websites work with this code (tested with twitter and google):

                            </font>&copy; fetaca 2011 </td>
                    </tr>
            </table>
    <p><br>
Bobby-Z
  • 44
  • 6
0

for reading contents from an URL, you can use jsoup and then you can wright the contents using file handling concept(OutputStream out =....), so for reading by using jsoup:

String url = "URL"; // getting URL
Document doc = Jsoup.connect(url).get(); // getting content as document type
String line = input.toString(); // getting contents as String type

Now after having the content in string u can easily flush it into a file.

For this - you will require jsoup jars. - import 3(three) classes: import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.select.Elements;

Soumya Sarkar
  • 95
  • 1
  • 6