0

I am using excel with VBA to open a page and extract some information and putting it in my database. After some research, I figured out that opening IE obviously takes more time and it can be achieved using XmlHTTP. I am using the XmlHTTP to open a web page as proposed in my another question. However, while using IE I was able to navigate through div tags. How can I accomplish the same in XmlHTTP?

If I use IE to open the page, I am doing something like below to navigate through multiple div elements.

Set openedpage1 = iedoc1.getElementById("profile-experience").getElementsbyClassName("title")
For Each div In openedpage1
---------

However, with XmlHttp, I am not able to do like below.

For Each div In html.getElementById("profile-experience").getElementsbyClassName("title")

I am getting an error as object doesn't support this property or method.

Community
  • 1
  • 1
Ramesh
  • 765
  • 7
  • 24
  • 52

3 Answers3

0

Take a look at this answer that I had posted for another question as this is close to what you're looking for. In summary, you will:

  1. Create a Microsoft.xmlHTTP object

  2. Use the xmlHTTP object to open your url

  3. Load the response as XML into a DOMDOcument object

From there you can get a set of XMLNodes, select elements, attributes, etc. from the DOMDocument

Community
  • 1
  • 1
Jaycal
  • 2,087
  • 1
  • 13
  • 21
  • Isn't this way of creating the DOMDocument based on XML? Will it work for HTML? I don't think it will - I may be wrong. I think you would have to first save the HTML, and re-parse it. – Andy G Jul 18 '13 at 21:36
  • Use a validator like the W3C one [here](http://validator.w3.org/#validate_by_uri) to check if the page is well-formed XHTML before trying to import into a `DOMDocument`. If it isn't well-formed then you'll need a different approach altogether – barrowc Jul 18 '13 at 22:40
0

The XMLHttp object returns the contents of the page as a string in responseText. You will need to parse this string to find the information you need. Regex is an option but it will be quite cumbersome.

This page uses string functions (Mid, InStr) to extract information from the html-text.

It may be possible to create a DOMDocument from the retreived HTML (I believe it is) but I haven't pursued this.

Andy G
  • 19,232
  • 5
  • 47
  • 69
0

As mentioned with the answers above put the .responseText into an HTMLDocument and then work with that object e.g.

Option Explicit
Public Sub test()
    Dim html As HTMLDocument
    Set html = New HTMLDocument

    With CreateObject("WINHTTP.WinHTTPRequest.5.1")
        .Open "GET", "http://www.someurl.com", False
        .send
        html.body.innerHTML = .responseText
    End With

    Dim aNodeList As Object, iItem As Long
    Set aNodeList = html.querySelectorAll("#profile-experience.title")
    With ActiveSheet
        For iItem = 0 To aNodeList.Length - 1
            .Cells(iItem + 1, 1) = aNodeList.item(iItem).innerText
            '.Cells(iItem + 1, 1) = aNodeList(iItem).innerText '<== or potentially this syntax
        Next iItem
    End With
End Sub

Note:

I have literally translated your getElementById("profile-experience").getElementsbyClassName("title") into a CSS selector, querySelectorAll("#profile-experience.title"), so assume that you have done that correctly.

QHarr
  • 83,427
  • 12
  • 54
  • 101