1

My UDF is pulling data from a webpage each time it is called.

The connection is like:

Public Sub myConnection(id)
    Set oHtml = New HTMLDocument
    With CreateObject("WINHTTP.WinHTTPRequest.5.1")
        .Open "GET", "http://www.example.com" & id, False
        .send
        oHtml.body.innerHTML = .responseText
    End With
End Sub

This is pulling stock data and it is working fine, but the workbook becomes very slow when a few hundred UDF calls are made.

My function is like:

Function myFunction(id,element)
Call myConnection(id)
Select Case element
    Case "p" 
    Set dados = oHtml.getElementsByClassName("thisClass")(0)
    myFunction = dados.innerText
   //etc...
End Select
End Function

Is there a more efficent (faster) way to do this?

I tried storing the pages in some kind of cache. I tried to push the pages into an array but I couldn't write the correct code to get and use the pages in that array.

The array should have 2 dimensions:

  1. id (page being queried with the stock ticker)
  2. name, price, beta, etc

I know this isn't very specific but I resarched a lot and couldn't advance any further, so perhaps someone could point some directions or propose a solution.

Thanks in advance.

  • 1
    [related](http://stackoverflow.com/questions/20213769/get-google-search-first-result-via-udf/20214131#20214131) –  Nov 26 '13 at 17:00
  • 1
    Your assignment statement is wrong because `oHTML` is a `New` HTML document, there simply is no such element as `"myClass"` in the document **until** you've read the `.responseText` or used another method to retrieve the page's HTML contents. – David Zemens Nov 26 '13 at 17:10
  • @DavidZemens thanks, that makes sense now, any other suggestions? –  Nov 26 '13 at 17:24
  • 1
    This code looks pretty streamlined actually... You could consider `Set oHTML = Nothing` before `End Sub` but VBA should be garbage collected (should be) so I'm not sure if that would really help you very much. There are a few things to also consider: this code is called from another subroutine, which may be the culprit. Or the "slowness" may be a function of the loop (obviously 300 calls to this function should take longer than 4 calls to the same function). Maybe automating IE (instead of the `CreateObject(WinHTTP...)`) might be faste -- i'm not sure-- but it might be worth testing it out. – David Zemens Nov 26 '13 at 17:51
  • Wait... are you trying to use this as a worksheet *Function*? – David Zemens Nov 26 '13 at 18:05
  • Exactly, that's the idea :-D –  Nov 26 '13 at 18:13
  • 2
    I wouldn't use a UDF for this - fetching data via HTTP has a high overhead, so you're going to get bogged down pretty quickly. Better to have a sub run the refresh "on demand" (eg. via button click). If you must use a UDF, then think about some form of caching for the data: when you get the response for id=x, store that locally (eg. in a Dictionary/Collection together with a timestamp) and then return that if the function is called again with the same input (and optionally within a certain time period if the response is time-sensitive). – Tim Williams Nov 26 '13 at 18:19
  • Typo perhaps in your question since you have it defined as a `Sub` not a function. In any case, if you debug the function (put a breakpoint on the first line, then use f8 to step through it) can you find where it is breaking? – David Zemens Nov 26 '13 at 18:20
  • @TimWilliams that's exactly what I tried to do. Use a UDF and store the HTML Objects in an array and call them locally every time they were requested again. Users had to re-open the worksheet to refresh connections. The problem is I couldn't figure out how to call the local pages. –  Nov 26 '13 at 18:22
  • @DavidZemens the Sub is being called from a Function procedure (it is only the connection). It isn't breaking, it is just taking too long for about 500 cells calling the function –  Nov 26 '13 at 18:23
  • 1
    Maybe you can update your question with a more complete example of what you've tried. What exactly are you returning to the cell from the UDF? Surely you don't need to store the whole page to cache that value? – Tim Williams Nov 26 '13 at 18:23
  • 1
    @NunoNogueira it is *always* going to take a long time to do 500 queries to the web... but echo Tim's comment surely you don't need to store the whole page in the cell (although parsing it will not really save time in executing the `Get` request... – David Zemens Nov 26 '13 at 18:28
  • I guess I will try both approaches: 1) save only certain data points in an array 2) Use a Sub procedure to update a table on request. You can see more details of the code I'm using here: http://stackoverflow.com/questions/20205442/excel-vba-get-inner-text-of-html-table-td?rq=1 –  Nov 26 '13 at 18:33
  • You have no much hope to shorten your time, if id's are all different. Say 1s per HTTP call, 500 cells needs 500s, so 8Min are required. – jacouh Nov 26 '13 at 18:44

0 Answers0