I'm attempting to make my first program in Java. The goal is to write a program that browses to a website and downloads a file for me. However, I don't know how to use Java to interact with the internet. Can anyone tell me what topics to look up/read about or recommend some good resources?
-
You could use Apache's [HttpClient](http://hc.apache.org/httpcomponents-client-ga/). Somewhat similar answer [here](http://stackoverflow.com/questions/6052018/how-do-send-query-to-website-and-parse-results/6052186#6052186) – iruediger May 28 '11 at 01:30
5 Answers
The simplest solution (without depending on any third-party library or platform) is to create a URL instance pointing to the web page / link you want to download, and read the content using streams.
For example:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
public class DownloadPage {
public static void main(String[] args) throws IOException {
// Make a URL to the web page
URL url = new URL("http://stackoverflow.com/questions/6159118/using-java-to-pull-data-from-a-webpage");
// Get the input stream through URL Connection
URLConnection con = url.openConnection();
InputStream is = con.getInputStream();
// Once you have the Input Stream, it's just plain old Java IO stuff.
// For this case, since you are interested in getting plain-text web page
// I'll use a reader and output the text content to System.out.
// For binary content, it's better to directly read the bytes from stream and write
// to the target file.
try(BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
String line = null;
// read each line and write to System.out
while ((line = br.readLine()) != null) {
System.out.println(line);
}
}
}
}
Hope this helps.

- 2,261
- 19
- 39

- 6,840
- 3
- 46
- 63
-
Hi when I implemnt this, I get the html file in my console. How can I get an specific value from an website – flowers1234 Nov 22 '19 at 11:41
The Basics
Look at these to build a solution more or less from scratch:
- Start from the basics: The Java Tutorial's chapter on Networking, including Working With URLs
- Make things easier for yourself: Apache HttpComponents (including HttpClient)
The Easily Glued-Up and Stitched-Up Stuff
You always have the option of calling external tools from Java using the exec()
and similar methods. For instance, you could use wget
, or cURL
.
The Hardcore Stuff
Then if you want to go into more fully-fledged stuff, thankfully the need for automated web-testing as given us very practical tools for this. Look at:
- HtmlUnit (powerful and simple)
- Selenium, Selenium-RC
- WebDriver/Selenium2 (still in the works)
- JBehave with JBehave Web
Some other libs are purposefully written with web-scraping in mind:
Some Workarounds
Java is a language, but also a platform, with many other languages running on it. Some of which integrate great syntactic sugar or libraries to easily build scrapers.
Check out:
- Groovy (and its XmlSlurper)
- or Scala (with great XML support as presented here and here)
If you know of a great library for Ruby (JRuby, with an article on scraping with JRuby and HtmlUnit) or Python (Jython) or you prefer these languages, then give their JVM ports a chance.
Some Supplements
Some other similar questions:

- 6,656
- 4
- 18
- 22

- 22,460
- 3
- 67
- 96
-
There's something I didn't write in that answer: I wouldn't really recommend doing this sort of stuff in Java (you may not have a choice, of course, but I'm just pointing it out). It's doable, and there are loads of tools for that, but Java's inherent verbosity make it not so friendly to experiment with a web service to scrap. Usually, I'd rather do this from a dynamic language with a REPL, or directly from my browser's console, etc... But of course, nothing's stopping you from starting like that and then implementing the solution in Java... or another JVM-based language! – haylem Jul 01 '14 at 09:34
Here's my solution using URL
and try with resources
phrase to catch the exceptions.
/**
* Created by mona on 5/27/16.
*/
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
public class ReadFromWeb {
public static void readFromWeb(String webURL) throws IOException {
URL url = new URL(webURL);
InputStream is = url.openStream();
try( BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
}
}
catch (MalformedURLException e) {
e.printStackTrace();
throw new MalformedURLException("URL is malformed!!");
}
catch (IOException e) {
e.printStackTrace();
throw new IOException();
}
}
public static void main(String[] args) throws IOException {
String url = "https://madison.craigslist.org/search/sub";
readFromWeb(url);
}
}
You could additionally save it to file based on your needs or parse it using XML
or HTML
libraries.

- 34,860
- 64
- 239
- 408
Since Java 11 the most convenient way it to use java.net.http.HttpClient
from the standard library.
Example:
HttpClient client = HttpClient.newBuilder()
.version(Version.HTTP_1_1)
.followRedirects(Redirect.NORMAL)
.connectTimeout(Duration.ofSeconds(20))
.proxy(ProxySelector.of(new InetSocketAddress("proxy.example.com", 80)))
.authenticator(Authenticator.getDefault())
.build();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("httpss://foo.com/"))
.timeout(Duration.ofMinutes(2))
.GET()
.build();
HttpResponse<String> response = client.send(request, BodyHandlers.ofString());
System.out.println(response.statusCode());
System.out.println(response.body());

- 11,553
- 8
- 64
- 88
-
I had to add a load of imports: import java.net.http.HttpClient; import java.net.http.HttpClient.Version; import java.net.http.HttpClient.Redirect; import java.net.http.HttpRequest; import java.net.http.HttpResponse; import java.net.http.HttpResponse.BodyHandlers; import java.time.Duration; and I'm still getting an error with null pointer exception on the authenticator line, not sure what I'm supposed to put in for proxy either :( – gfmoore Jan 13 '23 at 12:08
I use the following code for my API:
try {
URL url = new URL("https://stackoverflow.com/questions/6159118/using-java-to-pull-data-from-a-webpage");
InputStream content = url.openStream();
int c;
while ((c = content.read())!=-1) System.out.print((char) c);
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException ie) {
ie.printStackTrace();
}
You can catch the characters and convert them to string.

- 193
- 2
- 17