How to fetch HTML in Java

Question

Without the use of any external library, what is the simplest way to fetch a website's HTML content into a String?

possible duplicate of http://stackoverflow.com/questions/238547/how-do-you-programmatically-download-a-webpage-in-java — jjnguy, Apr 06 '10 at 05:29

score 46 · Accepted Answer · edited Dec 29 '18 at 11:17

46

I'm currently using this:

String content = null;
URLConnection connection = null;
try {
  connection =  new URL("http://www.google.com").openConnection();
  Scanner scanner = new Scanner(connection.getInputStream());
  scanner.useDelimiter("\\Z");
  content = scanner.next();
  scanner.close();
}catch ( Exception ex ) {
    ex.printStackTrace();
}
System.out.println(content);

But not sure if there's a better way.

edited Dec 29 '18 at 11:17

Community

1
1

answered Aug 28 '08 at 01:21

pek

17,847
28
86
99

5

Why "\\Z"? Isn't it an EOF on Windows only? I am just guessing here. – greenoldman Nov 09 '11 at 20:52
1

Why do you use "\\Z"? What does it do? I tried without it, it didn't work. – Max Husiv Feb 03 '17 at 14:03
@MaxHusiv I think it's because if you don't specify a delimiter, scanner.next() will just go through the whole HTML character by character, but if you use a delimiter which won't be found in the HTML, scanner.next() returns the whole thing. – Chris A Nov 15 '20 at 15:27
What import statements do you need for that to work? – theerrormagnet Sep 09 '22 at 13:07

score 22 · Answer 2 · answered Aug 29 '08 at 05:11

22

This has worked well for me:

URL url = new URL(theURL);
InputStream is = url.openStream();
int ptr = 0;
StringBuffer buffer = new StringBuffer();
while ((ptr = is.read()) != -1) {
    buffer.append((char)ptr);
}

Not sure at to whether the other solution(s) provided are any more efficient or not.

answered Aug 29 '08 at 05:11

Scott Bennett-McLeish

9,187
11
41
47

1

Don't you need to include the following? import java.io.* import java.net.* – Seun Osewa Oct 19 '09 at 03:05
1

Sure, but they're core java so very simple. As for the actual code, the import statements are omitted for clarity. – Scott Bennett-McLeish Oct 20 '09 at 00:14
1

after `while`, you should display the buffer's content too! or write a method where you read it! – rupinderjeet Jul 01 '16 at 07:53
2

be sure to `close` the inputstream – Aaron Esau Jan 03 '17 at 03:34

score 2 · Answer 3 · answered Mar 05 '13 at 09:16

2

Whilst not vanilla-Java, I'll offer up a simpler solution. Use Groovy ;-)

String siteContent = new URL("http://www.google.com").text

answered Mar 05 '13 at 09:16

Scott Bennett-McLeish

9,187
11
41
47

score 2 · Answer 4 · edited May 23 '17 at 12:18

2

I just left this post in your other thread, though what you have above might work as well. I don't think either would be any easier than the other. The Apache packages can be accessed by just using import org.apache.commons.HttpClient at the top of your code.

Edit: Forgot the link ;)

edited May 23 '17 at 12:18

Community

1
1

answered Aug 28 '08 at 01:31

Justin Bennett

8,906
2
27
30

Apparently you also have to install the JAR file :) – Seun Osewa Oct 19 '09 at 03:19

score 0 · Answer 5 · answered Jul 03 '23 at 06:49

 try {
        URL u = new URL("https"+':'+'/'+'/'+"www.Samsung.com"+'/'+"in"+'/');
        URLConnection urlconnect = u.openConnection();
        InputStream stream = urlconnect.getInputStream();
        int i;
        while ((i = stream.read()) != -1) {
            System.out.print((char)i);
        }
    }
    catch (Exception e) {
        System.out.println(e);
    }

score -4 · Answer 6 · answered Jul 14 '18 at 10:57

-4

Its not library but a tool named curl generally installed in most of the servers or you can easily install in ubuntu by

sudo apt install curl

Then fetch any html page and store it to your local file like an example

curl https://www.facebook.com/ > fb.html

You will get the home page html.You can run it in your browser as well.

answered Jul 14 '18 at 10:57

dinesh kandpal

738
7
16

6

*Squints eyes to show shock*. This is a Java Question. – Dec 23 '18 at 01:50

How to fetch HTML in Java

6 Answers6

Linked