Save pages sites

Question

help to make such a thing, we have a text file, there are a lot of links to different websites (each link rasolozhena a new line, and they are written in the form http://test.com), you need to walk on the Java program to all link and save the pages of these sites in the folder C :/ / test in html format, and that the name of these pages were the same as in the tags

Well, or provide references, which describes how to do this, but if you wrote to me the code, I would be very grateful — Eric Scot, Nov 19 '12 at 12:47

Victor · Accepted Answer · 2012-11-19T15:49:13.327

This is the code for reading URLs from a txt file and write in another file, as you describe in your question.

public static void main(String[] args) {
    BufferedReader reader = null;
    try {
        reader = new BufferedReader(new FileReader(new File("urlList.txt")));
        String url = reader.readLine();
        int i = 0;
        while (url != null) {
            try {
                getContent(url, i);
            } catch (IOException io) {
                System.out.println(io);
            }
            i++;
            url = reader.readLine();
        }

    } catch (IOException io) {
        System.out.println(io);
    } finally {
        if (reader != null) {
            try {
                reader.close();
            } catch (IOException e) {
                // nothing
            }
        }
    }
}

private static void getContent(String url, int index)
        throws MalformedURLException, IOException {
    URL pageUrl;
    URLConnection conn = null;

    pageUrl = new URL(url);
    conn = pageUrl.openConnection();

    conn.connect();

    InputStreamReader in = new InputStreamReader(conn.getInputStream());
    BufferedReader reader = new BufferedReader(in);
    String htmlFileName = "file_content_" + index + ".txt";
    FileWriter fWriter = new FileWriter(htmlFileName);
    BufferedWriter bWriter = new BufferedWriter(fWriter);
    String urlData = null;
    while ((urlData = reader.readLine()) != null) {
        bWriter.write(urlData);
        bWriter.newLine();
    }
    bWriter.close();
}

thanks but I'm interested in the question is that the program bralav links from the file and stores all the pages in their format — Eric Scot, Nov 19 '12 at 13:22
I pointed out that I need all the links from a text file and get information from them to save, you can give a more detailed answer. Beginner's not very clear — Eric Scot, Nov 19 '12 at 13:51
Try find how to read a file using BufferedReader. So you can read each line and get information with my code snippet. And write with a similar code as you read. — Victor, Nov 19 '12 at 14:25

score 0 · Answer 2 · edited Nov 19 '12 at 15:52

public class URLReader
{
      public static void main(String[] args)
      {
           try
                  {
                    URL pageUrl;
                    URLConnection conn = null;

                    pageUrl = new URL("https://www.google.ru/");
                    conn = pageUrl.openConnection();

                    conn.connect();

                    InputStreamReader in = new InputStreamReader(conn.getInputStream());
                    BufferedReader reader = new BufferedReader(in);
                    String htmlFileName = "C:\\hello.html";
                    FileWriter fWriter = new FileWriter(htmlFileName);
                    BufferedWriter bWriter = new BufferedWriter(fWriter);
                    String urlData = null;
                    while ((urlData = reader.readLine()) != null)
                    {
                          bWriter.write(urlData);
                          bWriter.newLine();
                    }
                    bWriter.close();
              }
              catch(IOException io)
              {
                   System.out.println(io);
              }
      }
}

@Victor Here's a start, you can improve the code, everything to be as I described in the question? please

score 0 · Answer 3 · edited May 23 '17 at 11:56

I asked similar question some time ago: Reading website's contents into string

Instead of reading it into string you can copy it to some FileOutputStream. There is one nice function for that in Apache Commons IOUtils:

copy(InputStream input, OutputStream output) 
Copy bytes from an InputStream to an OutputStream.

http://commons.apache.org/io/api-release/org/apache/commons/io/IOUtils.html

If you want to download images and other files on your pages too, you'd better use some library.

Of course you can implement that by yourself if you are learning. Regular expressions can be useful to find links to images in HTML files.

Save pages sites

3 Answers3