0

I have a URL, from which I need its HTML. I used the following piece of code.

String url = "http://www.sears.com/search="+keywords;
String jsp = retrieveContent(url);

I noticed that the string jsp has different contents than the actual source code of the web page (when I view source from my browser). I believe a server-side script (or it is being redirected) is being called when the browser opens and this script's output is what i see. Can you tell me a way to get the actual html file of the page?

You can check this link as an example. how do i get the actual html of this pagE?

http://www.sears.com/search=baby%20strollers

  • This has nothing to do with JSP. It probably sends different responses to different User-Agents. – SLaks Sep 22 '13 at 14:33
  • try using an http get on that url. http://stackoverflow.com/questions/1485708/how-do-i-do-a-http-get-in-java – ddavison Sep 22 '13 at 14:33
  • I'll put it a little more clearly. I need to get a html file from a url in java, but when i do it, it shows a different source code rather than the actual source code, do you have any idea how to get the actual source code? i tried using htmlunit, but don't know how to use it properly – user2804374 Sep 22 '13 at 15:26

3 Answers3

0

The HTML page is being stored in the variable jsp. alert(jsp) would show you the page.

Cyril Joudieh
  • 132
  • 2
  • 15
  • `Undefined local method or variable 'alert'` ;) – ddavison Sep 22 '13 at 14:34
  • 1
    So this is not javascript. alert shows you in a popup window what's in a variable. – Cyril Joudieh Sep 22 '13 at 14:53
  • I'll put it a little more clearly. I need to get a html file from a url in java, but when i do it, it shows a different source code rather than the actual source code, do you have any idea how to get the actual source code? i tried using htmlunit, but don't know how to use it properly – user2804374 Sep 22 '13 at 15:28
0

i would like to see the retrieveContent method's code if it's possible and if you are trying to read a url's html content directly then there is a nice example here http://docs.oracle.com/javase/tutorial/networking/urls/readingURL.html

noobandy
  • 181
  • 4
  • I did almost just as given in the example, but the html that i get is not the same as the source code that i get in the browser. – user2804374 Sep 22 '13 at 22:25
0

There are many html content scrapping libraries out there which will do the trick , the one which I have used is JSOUP , JSOUP says:

"scrape and parse HTML from a URL, file, or string"

JSOUP might suit your purpose.

Tito
  • 8,894
  • 12
  • 52
  • 86
  • Jsoup is what i'm using, but i need to get the HTML to parse through it. Right now, I do not get the right source code at all, I'm getting a different source code when i use a direct method to get the URL's html source. – user2804374 Sep 22 '13 at 22:26