4

I am trying to retrieve table data from aspx page using excel vba.I know how to get table data from a URL but below is the main problem.

Problem

There is an aspx page (say www.abc.aspx). I am currently on this page.Let this page be page1.

Now I click a page2 link on the current page. What is worth noticing is that after clicking this link, the old URL (www.abc.aspx) doesn't change but the content changes.( Content is of page2 )

If you view page1 source code it has

<form method="post" action="page1 url" id="Form1">

Whatever is the action on page1 (page2 click) , it posts back the same page1 url.

So how can I get page2 table data in excel VBA since I don't know its URL?

Code

This is what I had used to fetch table data.

I used internet explorer object.Then navigated to the link and saved the document in htmldoc.

ie.navigate "url"

Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Fetching data..."
DoEvents
Loop

Set htmldoc = ie.document

'Column headers
Set eleColth = htmldoc.getElementsByTagName("th")
j = 0 'start with the first value in the th collection
        For Each eleCol In eleColth 'for each element in the td collection
            ThisWorkbook.Sheets(1).Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
            j = j + 1 'move to next element in td collection
        Next eleCol 'rinse and repeat


'Content
Set eleColtr = htmldoc.getElementsByTagName("tr")

'This section populates Excel
    i = 0 'start with first value in tr collection
    For Each eleRow In eleColtr 'for each element in the tr collection
        Set eleColtd = htmldoc.getElementsByTagName("tr")(i).getElementsByTagName("td") 'get all the td elements in that specific tr
        j = 0 'start with the first value in the td collection
        For Each eleCol In eleColtd 'for each element in the td collection
            ThisWorkbook.Sheets(1).Range("D3").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
            j = j + 1 'move to next element in td collection
        Next eleCol 'rinse and repeat
        i = i + 1 'move to next element in td collection
    Next eleRow 'rinse and repeat

ie.Quit
Set ie = Nothing

EDIT:

Example

If we click on questions in Stack Overflow (https://stackoverflow.com/questions) and now click on page2 of questions (new link is https://stackoverflow.com/questions?page=2&sort=newest)

In my case, if we click on page2, the new link is not updated.It is the same old link.

EDIT: I have found a similar question here

How do I get url that is hidden by javascript on external website?

Thanks.

Yangshun Tay
  • 49,270
  • 33
  • 114
  • 141
user3126632
  • 467
  • 2
  • 8
  • 15

2 Answers2

2

Ok, I sympathise, there is a school of thought (including Tim Berners-Lee) that says every separate page should have its own URI and that these don't change.

But webmasters can and do mess you around. They can redirect you HTTP request and can obfuscate the navigation as in your case. They can rewrite HTTP requests.

You have two options

Option 1 - Let Internet Explorer resolve the new content for you

So, if the content is visible on the screen then it must be in the Document Object Model (DOM). In IE, or indeed in Chrome, one can right-click and get the context menu and then choose Inspect to see where in the DOM that element resides.

I think your code demonstrates enough expertise to drill in. However, sometimes some websites like to disable the Inspect menu option to avoid programers poking around. (EDIT: As in your case now that I have read the comments)

Option 2 - Use an HTTP sniffing Tool like Fiddler to detect the HTTP redirect/rewrite

As I said above, HTTP requests can be rewritten and redirected by the web-server but the HTTP protocol does give notifications of redirects. There are tools to detect this. A popular tool is Fiddler, today I have discovered that there is a specific IE Fiddler add-on.

To be honest though the developer tools that ship with the browsers themselves, particularly Chrome (Ctrl+Shift+I, then Network tab), show network traffic to a level of detail increasingly on a par with any sniffing tool.

Sorry you got down-voted, this seems like a perfectly reasonable question.

S Meaden
  • 8,050
  • 3
  • 34
  • 65
  • The onclick attribute of page1 is connected to _doPostBack() function which takes eventTarget and eventArgument and it submits the same form along with variables updated.Also there is a viewstate hidden variable.So how can I fetch data from page2 as I don't know page2 url ? – user3126632 Dec 26 '17 at 08:38
  • _doPostBack() and its arguments are part of a protocol for aspx pages that is layered on top on HTTP Post; so ignore them and use the snifffer.to see the underlying HTTP Post (option 2 above). Post sniffer screen shot/ content into body of main question when you have done this, – S Meaden Dec 26 '17 at 11:37
  • I tried this using Chrome developer tools and going to Network and its showing same page1 url when clicking page2 link on page1. – user3126632 Dec 26 '17 at 17:02
  • ok, try Fiddler. Or, still with Chrome see these SO https://stackoverflow.com/questions/4734374/see-full-redirect-path-and-http-status-code-in-chrome and https://stackoverflow.com/questions/10987453/how-to-use-chromes-network-debugger-with-redirects/12282621 They detail the 'Preserve log' checkbox. – S Meaden Dec 26 '17 at 17:04
  • How can I use fiddler ? Can you please explain – user3126632 Dec 26 '17 at 17:36
  • https://www.youtube.com/watch?v=xsSRzd4XDtE&t=7s HOW TO: Use Fiddler to capture network traffic – S Meaden Dec 26 '17 at 20:42
0

A bird's eye view on the problem:

You have a requirement that you seem to not be able to let go: Use Excel VBA. I emphasize this point since often times answers provide solutions satisfying alternative premises from what is posted in the OP.

A possible solution:

So you have to Interface Excel VBA with another tool having the capability for revealing the contents of html redirects or obfuscated URLs.

Google Chrome Developer Tools reveals all contents, and you can interface very nicely Google Chrome with Excel VBA, using the Selenium VBA Wrapper. Download here.

It is quite versatile, e.g., you can see how to scrape web data.

As for getting obfuscated contents, there are a few items that may help

how to get innerHTML of whole page in selenium driver? (not VBA but useful)

Selenium + VBA to Control Chrome

(Note: the author of the wrapper is usually eager to answer in SO, and precise in his answers).

I guess YMMV, there are always people trying to obfuscate their data, with various techniques, and often for good reasons...

If you have a real example for your http://www.abc.aspx, it may help.