3

I have a problem once again where I cant find the source code because its hidden or something... When my java program indexes the page it finds everything but the info i need... I assume its hidden for a reason but is there anyway around this?

Its just a bunch of tr/td tags that show up in firebug but dont show up when viewing the page source or when i do below

URL url = new URL("my url");
            URLConnection yc = url.openConnection();
            BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
            String inputLine;
            while ((inputLine = in.readLine()) != null) {

I really have no idea how to attempt to get the info that i need...

Jon Storm
  • 185
  • 1
  • 3
  • 10

4 Answers4

2

The reason for this behavior is because probably those tags are dynamically injected into the DOM using javascript and are not part of the initial HTML which is what you can fetch with an URLConnection. They might even be created using AJAX. You will need a javascript interpreter on your server if you want to fetch those.

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
0

If they don't show up in the page source, they're likely being added dynamically by Javascript code. There's no way to get them from your server-side script short of including a javascript interpreter, which is rather high-overhead.

The information in the tags is presumably coming from somewhere, though. Why not track that down and grab it straight from there?

Nick Johnson
  • 100,655
  • 16
  • 128
  • 198
0

Try Using Jsoup.

Document doc = doc=Jsoup.parse("http:\\",10000);
System.out.print(doc.toString());
Rasel
  • 15,499
  • 6
  • 40
  • 50
0

Assuming that the issue is that the "missing" content is being injected using javascript, the following SO Question is pertinent:

Community
  • 1
  • 1
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216