Is there any way to write the text from a web page onto a text document? (.txt) I began using Jsoup today to try and do this but it seemed to not quite be what I was looking for (or so I think). So, if there is any way to do this with Jsoup or anything else please let me know. Thanks.
2 Answers
JSoup allows you to access the body of the response as a string So you could do response.body() and write that using a normal PrintWriter or whatever you are comfortable with.
JSoup is primarily about doing entity extraction from sites, so if this is all you need to do then maybe you can us a simpler library - often, it is useful to separate the scraping from the parsing for parallelization. Apache HttpClient is a very popular library for performing HTTP requests and you could just get the response entity as a string and write it, per this example
Do you need to negotiate an SSL connection or pass along any cookies? If so, HttpClient offers a lot of nice features
Jetty HttpClient is another alternative, or you could even to curl url > filename.

- 347
- 1
- 6
Here's an option for you: How to read a text from a web page with Java?
Instead of
System.out.println(str);
You need to write to a .txt file

- 1
- 1

- 354
- 1
- 10