1

I am learning Java and I have come upon a challenge to make a web address extractor. The program does nothing but sweeps through the page given to it to find external links in it. I have an idea on how to do that but I am having some trouble implementing the sockets class. What I need help with is I need to make a socket which will connect to the html page using port 80. Next, i need the complete html of that page into an input stream of the socket so that I can manipulate with it to extract the links.

To sum up, clarification on the following things is what I need:

  1. get html of the page into the input stream of the socket.
  2. print input stream onto the console.

EDIT: Sorry my bad. Confused output stream with input stream.

Ching Ling
  • 301
  • 1
  • 5
  • 14
  • You need not OutputStream, but InputStream to read from socket. What's your exact problem by the way? – mkrakhin Feb 11 '15 at 08:59
  • Look up how the HTTP protocol works. You connect to www.stackoverflow.com port 80, then HTTP tells you what to send and receive... alternatively, if you don't have to use sockets, use `URL` and `URLConnection`. – user253751 Feb 11 '15 at 08:59

3 Answers3

0

What you do that with a socket try to use UrlConnection class.

    URL connection= new URL("http://lums.edu.pk);
    URLConnection yc = connection.openConnection();
    BufferedReader in = new BufferedReader(new InputStreamReader(
                                yc.getInputStream()));
    String inputLine;
    while ((inputLine = in.readLine()) != null) 
        System.out.println(inputLine);
    in.close();

If you try to do with a socket you need to understan what is the http protocol and how you can retrieve data from the servers using that. At the end is sent a GET request using socket. Check this question to see how that works.

Community
  • 1
  • 1
Koitoer
  • 18,778
  • 7
  • 63
  • 86
  • I need to do this using sockets. I know how to do this using UrlConnection class. It's for learning purposes and I want to be able to do it using the sockets as well because I know it is possible. I will look into how http works . Thanks. – Ching Ling Feb 11 '15 at 09:05
  • Ur solution is by far the most helpful yet. Could you help me better understand the http protocol? The link u shared doesn't explain how to use java to talk in http protocol (or does so very precisely) It's not elaborate enough. – Ching Ling Feb 11 '15 at 09:16
  • Basically http is a protocol with some keywords that you need to use to request resources a socket is a non protocol channel to obtain information over the network, basically you need to build a request like this. http://www.studytonight.com/servlet/images/get-request-method.jpg, very large explanation about all the possible options, but at the end you need to talk in a way that the server understand. dont forget to choose my answer. – Koitoer Feb 11 '15 at 17:57
-1

Since you don't need to use sockets, it's much easier to use a library (in this case, one included in Java) that handles the HTTP request for you, and just gives you a plain stream with the page content:

public class UrlExtractor {
    public static void main(String[] args) throws Exception {
        URL url = new URL("http://www.stackoverflow.com/");
        URLConnection conn = url.openConnection();
        InputStream in = conn.getInputStream();

        // read the page content (usually HTML) from in here

        in.close();
    }
}
user253751
  • 57,427
  • 7
  • 48
  • 90
  • I know how its done using the UrlConnection class. But I want to be able to do it using sockets as well because I know for sure that its possible and i want to learn how. – Ching Ling Feb 11 '15 at 09:05
-2

Look at this answer to create an OutputStream. Use the PrintStream class to print the OutputStream.

Community
  • 1
  • 1
Martijn Burger
  • 7,315
  • 8
  • 54
  • 94
  • Because your answer doesn't seem to answer the question. – user253751 Feb 11 '15 at 20:06
  • Well, the question wasn how to read an url and print it to the screen either. There were two questions in the post which are addressed both. I could include a code sample, but hey, he stated clearly that it was for learning purposes. – Martijn Burger Feb 11 '15 at 20:27