How can i fetch a HTML page and save it to my database in JAVA?is there any easy way to do that?
Asked
Active
Viewed 1,231 times
3
-
possible duplicate of [How we can Download a HTML Page using JAVA??](http://stackoverflow.com/questions/3341516/how-we-can-download-a-html-page-using-java) – McDowell Jul 28 '10 at 07:34
-
@McDowell : is that anyproblem? i am new to StackOverFlow – Alex Mathew Jul 28 '10 at 07:36
-
welcome to Stack Overflow. By adding links to possible duplicates, it allows question posters and answerers to navigate to related information where they might find that the question has already been answered. If the community judges the question to be too similar to another question, it will be closed as a duplicate. You can find more about how the site works on Meta: http://meta.stackexchange.com/questions/7931/the-official-faq-for-stack-overflow-server-fault-and-super-user – McDowell Jul 28 '10 at 08:02
2 Answers
2
Receiving a file over http is pretty easy using the URL class:
String rawHtml = IOUtils.toString(new URL("http://yahoo.com").openStream());
IOUtils is taken from org.apache.commons.io, the toString method reads the whole input stream into one String. Unfortunately by using java.net.URL you cannot control anything (cookies, header information, ..) besides the website's address :-/ Personally, I use this approach wherever I can since the HttpClient's API is too complex (too many LOC) to simply retrieve the source code of a website.

f1sh
- 11,489
- 3
- 25
- 51
1
Not sure about your exact requirements.
For something simple you can use HttpClient
For something more complex, you can use Nutch It does crawling, indexing and searching as well.

leonm
- 6,454
- 30
- 38
-
First Upon thks for reply, what i need is,if i type www.yahoo.com in the textbox,then it should copy the entire html of yahoo's index page to database, is there any way for that? – Alex Mathew Jul 28 '10 at 07:31
-
You'll have to write some plumbing of your own. Basically you'll fetch the URL from the textbox and pass it to HttpClient (or something similar). Upon a successful return you store the contents to a database, perhaps with JPA or straight JDBC. – leonm Jul 28 '10 at 08:07