3

I have been trying desperately for months to automate a process whereby a csv file is downloaded, maned and saved in a given location. so far I only managed with excel vba to open the web page and click the bottom to download the csv file, but the code stop and required a manual intervention to to be completed, i would like it to be fully automated if possible. see the code used (I am not the author):

Sub WebDataExtraction()
Dim URL As String
Dim IeApp As Object
Dim IeDoc As Object
Dim ieForm As Object
Dim ieObj As Object
Dim objColl As Collection

URL = "http://www.bmreports.com/bsp/BMRSSystemData.php?pT=DDAD&zT=N&dT=NRT"

Set IeApp = CreateObject("InternetExplorer.Application")
IeApp.Visible = True
IeApp.Navigate URL

Do Until IeApp.ReadyState = READYSTATE_COMPLETE
Loop

Set IeDoc = IeApp.Document
For Each ele In IeApp.Document.getElementsByTagName("span")

If ele.innerHTML = "CSV" Then
Application.Wait (Now + TimeValue("0:00:15"))
DoEvents
ele.Click
'At this point you need to Save the document manually
' or figure out for yourself how to automate this interaction.
Test_Save_As_Set_Filename
File_Download_Click_Save
End If

Next

IeApp.Quit
End Sub"

thanks in advance

Nunzio

Nunzio Puntillo
  • 33
  • 1
  • 1
  • 4
  • Using `XMLHTTP` is easier than automating IE. For example,http://stackoverflow.com/questions/7747877/asynchronous-file-downloads-from-within-vba-excel – brettdj Jun 20 '13 at 23:01
  • Thanks for the suggestion but i am not familiar at all with XMLHTTP – Nunzio Puntillo Jun 20 '13 at 23:05
  • BTW that looks like my code. I remember writing the comment "figure out for yourself how to automate this interaction". – David Zemens Jun 21 '13 at 03:05
  • @DavidZemens it is your code, in fact i specified that I am not the author of it and i left all your comment I do not want to take credit for your codes ;) thanks for for it – Nunzio Puntillo Jun 21 '13 at 16:43
  • Oh, I did not mean anything like that. You are welcome to use my code, I just thought it was interesting to see my code again :) Cheers. – David Zemens Jun 21 '13 at 16:45

2 Answers2

1

I am posting a second answer, since, as I believe my first answer is adequate for many similar applications, it does not work in this instance.

Why the other methods fail:

  • The .Click method: This raises a new window which expects user input at run-time, it doesn't seem to be possible to use the WinAPI to control this window. Or, at least not any way that I can determine. The code execution stops on the .Click line until the user manually intervenes, there is no way to use a GoTo or a Wait or any other method to circumvent this behavior.
  • Using a WinAPI function to download the source file directly does not work, since the button's URL does not contain a file, but rather a js function that serves the file dynamically.

Here is my proposed workaround solution:

You can read the webpage's .body.InnerText, write that out to a plain text/csv file using FileSystemObject and then with a combination of Regular Expressions and string manipulation, parse the data into a properly delimited CSV file.

Sub WebDataExtraction()
    Dim url As String
    Dim fName As String
    Dim lnText As String
    Dim varLine() As Variant
    Dim vLn As Variant
    Dim newText As String
    Dim leftText As String
    Dim breakTime As Date
'## Requires reference to Microsoft VBScript Regular Expressions 5.5
    Dim REMatches As MatchCollection
    Dim m As Match
'## Requires reference to Microsoft Internet Controls
    Dim IeApp As InternetExplorer
'## Requires reference to Microsoft HTML object library
    Dim IeDoc As HTMLDocument
    Dim ele As HTMLFormElement
'## Requires reference to Microsoft Scripting Runtime
    Dim fso As FileSystemObject
    Dim f As TextStream
    Dim ln As Long: ln = 1


    breakTime = DateAdd("s", 60, Now)
    url = "http://www.bmreports.com/bsp/BMRSSystemData.php?pT=DDAD&zT=N&dT=NRT"
    Set IeApp = CreateObject("InternetExplorer.Application")

    With IeApp
        .Visible = True
        .Navigate url

        Do Until .ReadyState = 4
        Loop

        Set IeDoc = .Document
    End With
    'Wait for the data to display on the page
    Do
        If Now >= breakTime Then
            If MsgBox("The website is taking longer than usual, would you like to continue waiting?", vbYesNo) = vbNo Then
                GoTo EarlyExit
            Else:
                breakTime = DateAdd("s", 60, Now)
            End If
        End If
    Loop While Trim(IeDoc.body.innerText) = "XML CSV Please Wait Data Loading Sorting"

    '## Create the text file
    fName = ActiveWorkbook.Path & "\exported-csv.csv"
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set f = fso.OpenTextFile(fName, 2, True, -1)
    f.Write IeDoc.body.innerText
    f.Close
    Set f = Nothing

    '## Read the text file
    Set f = fso.OpenTextFile(fName, 1, False, -1)
    Do
        lnText = f.ReadLine
        '## The data starts on the 4th line in the InnerText.
        If ln >= 4 Then
            '## Return a collection of matching date/timestamps to which we can parse
            Set REMatches = SplitLine(lnText)
            newText = lnText
            For Each m In REMatches
                newText = Replace(newText, m.Value, ("," & m.Value & ","), , -1, vbTextCompare)
            Next
            '## Get rid of consecutive delimiters:
            Do
                newText = Replace(newText, ",,", ",")
            Loop While InStr(1, newText, ",,", vbBinaryCompare) <> 0
            '## Then use some string manipulation to parse out the first 2 columns which are
            '   not a match to the RegExp we used above.
            leftText = Left(newText, InStr(1, newText, ",", vbTextCompare) - 1)
            leftText = Left(leftText, 10) & "," & Right(leftText, Len(leftText) - 10)
            newText = Right(newText, Len(newText) - InStr(1, newText, ",", vbTextCompare))
            newText = leftText & "," & newText

            '## Store these lines in an array
            ReDim Preserve varLine(ln - 4)
            varLine(ln - 4) = newText
        End If
        ln = ln + 1

    Loop While Not f.AtEndOfStream
    f.Close

'## Re-open the file for writing the delimited lines:
    Set f = fso.OpenTextFile(fName, 2, True, -1)
    '## Iterate over the array and write the data in CSV:
    For Each vLn In varLine
        'Omit blank lines, if any.
        If Len(vLn) <> 0 Then f.WriteLine vLn
    Next
    f.Close

EarlyExit:
    Set fso = Nothing
    Set f = Nothing
    IeApp.Quit
    Set IeApp = Nothing

End Sub

Function SplitLine(strLine As String) As MatchCollection
'returns a RegExp MatchCollection of Date/Timestamps found in each line
'## Requires reference to Microsoft VBScript Regular Expressions 5.5
Dim RE As RegExp
Dim matches As MatchCollection
    Set RE = CreateObject("vbscript.regexp")
    With RE
        .MultiLine = False
        .Global = True
        .IgnoreCase = True
        '## Use this RegEx pattern to parse the date & timestamps:
        .Pattern = "(19|20)\d\d[-](0[1-9]|1[012])[-](0[1-9]|[12][0-9]|3[01])[ ]\d\d?:\d\d:\d\d"
    End With
    Set matches = RE.Execute(strLine)
    Set SplitLine = matches
End Function
David Zemens
  • 53,033
  • 11
  • 81
  • 130
  • once again thanks for your collaboration, but the code breaks here:Set REMatches = SplitLine(lnText) "sub or funtion not defined" I tried with split but still receive an error Thanks again – Nunzio Puntillo Jun 24 '13 at 17:31
  • Oh I forgot to include the code for the function `SplitLine` :) I will update in a moment! – David Zemens Jun 24 '13 at 17:35
  • the code breaks to:"leftText = Left(newText, Application.WorksheetFunction.Find(",", newText, 1) - 1)" which seems not to be supported in VBA, I tried with instr which is the equivalent of find in vba but i still get an error: "leftText = Left(newText, InStr(1, newText, ",", vbTextCompare) - 1)" – Nunzio Puntillo Jun 24 '13 at 19:05
  • The `Application.WorksheetFunction.Find` is absolutely supported in VBA. And for future reference, please, if you get "an error" please always take the time to tell me **what error message**. I can't read your mind. It will also be helpful if you will tell me what is the value of `newText` at the time this error raises. – David Zemens Jun 24 '13 at 19:13
  • And, just to point out another obvious: as a matter of courtesy, it would be nice if you took 1 second to at least upvote **all of the help** that I've given you so far on this. You've written exactly 0% of the code that you're using. And I'm doing this for free. So, please consider upvoting the answers I've given you so far. – David Zemens Jun 24 '13 at 19:14
  • i did not know how to do, i will figure out how to give the highest rate anyway regardeless of your answer – Nunzio Puntillo Jun 24 '13 at 20:15
  • What don't you know how to do? Tell me what the error message is and I'm sure I can probably figure it out. You can also add a `Debug.Print newText` before the line which raises the error, that way we can examine the value of the string which will help troubleshoot the error. – David Zemens Jun 24 '13 at 20:19
  • I do not have enough reputation I will try to gain some and upvote all your help, do not worry you do not need to do anything else,you have done a lot already sorry again I did not know how it works. – Nunzio Puntillo Jun 24 '13 at 20:24
  • It's OK. But you need to understand how it is not possibel for me to help, unless I have the information needed to troublshoot. In this case, I need to know both the *error message* and if possible, what value is in `newText` when the error raises. – David Zemens Jun 24 '13 at 20:27
  • error: run-time error 1004, unable to get find property on the worksheet function class – Nunzio Puntillo Jun 24 '13 at 20:35
  • OK. If you've gotten that far, there is a file saved `ActiveWorkbook.Path & "\exported-csv.csv"`. Can you upload that file to google docs, or email it to me, `dzemens@gmail.com`? – David Zemens Jun 24 '13 at 21:15
  • I have received it. I will try a few things. I have some revisions I will update for you in the morning. – David Zemens Jun 25 '13 at 01:43
0

EDIT

I tested my original answer code using the URL:

http://www.bmreports.com/bsp/BMRSSystemData.php?pT=DDAD&zT=N&dT=NRT#saveasCSV

But this method does not seem to work, for this site. The ele.Click doesn't seem to initiate the download, it just opens the data tabular on the webpage. To download, you need to do the right-click/save-as. If you have gotten that far (as I suspect, based on the subroutines you are calling, but for which you did not provide the code), then you can probably use the Win API to get the HWND of the Save dialog and possibly automate that event. Santosh provides some information on that:

VBA - Go to website and download file from save prompt

Here is also a good resource that should help solve your problem:

http://social.msdn.microsoft.com/Forums/en-US/beb6fa0e-fbc8-49df-9f2e-30f85d941fad/download-file-from-ie-with-vba

Original Answer

If you are able to determine the URL of the CSV then you can use this subroutine to open a connection to the CSV data and import it directly to the workbook. You may need to automate a text-to-columns operation on the imported data, but that can easily be replicated with the macro recorder. I put an example of this in the Test() subroutine below.

You could easily modify this to add the QueryTables in a new workbook, and then automate the SaveAs method on that workbook to save the file as a CSV.

This example uses a known URL for Yahoo Finance, Ford Motor Company, and will add a QueryTables with the CSV data in cell A1 of the active worksheet. This can be modified pretty easily to put it in another sheet, another workbook, etc.

Sub Test()
Dim MyURL as String
MyURL = "http://ichart.finance.yahoo.com/table.csv?s=GM&a0&b=1&c2010&d=05&e=20&f=2013&g=d&ignore=.csv"

OpenURL MyURL

'Explode the CSV data:
Range("A:A").TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _
    TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
    Semicolon:=False, Comma:=True, Space:=False, Other:=False, FieldInfo _
    :=Array(Array(1, 3), Array(2, 1), Array(3, 1), Array(4, 1), Array(5, 1), Array(6, 1), _
    Array(7, 1)), TrailingMinusNumbers:=True

End Sub

Private Sub OpenURL(fullURL As String)

'This opens the CSV in querytables connection.
On Error GoTo ErrOpenURL
    With ActiveSheet.QueryTables.Add(Connection:= _
        "URL;" & fullURL, Destination:=Range("A1"))
        .Name = fullURL
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = True
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlEntirePage
        .WebFormatting = xlWebFormattingAll
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With

ExitOpenURL:
Exit Sub 'if all goes well, you can exit

'Error handling...

ErrOpenURL:
Err.Clear
bCancel = True
Resume ExitOpenURL


End Sub
Community
  • 1
  • 1
David Zemens
  • 53,033
  • 11
  • 81
  • 130
  • Querytable does not work for this website, I tried already but without any good result. Thanks anyway. – Nunzio Puntillo Jun 21 '13 at 19:20
  • @NunzioPuntillo I'm well aware of this, which is why I explain why it doesn't work, and also offer you some additional information. I suspect the answer you're looking for is in one or both of those links. Follow them, try out those methods, and see if they will solve your problem. You might be able to answer your own question this way. – David Zemens Jun 21 '13 at 19:26
  • @DevidZemens I TRIED BUT IT DOES NOT WORK :( – Nunzio Puntillo Jun 21 '13 at 22:48
  • I see the `SaveAs` dialog is interfering with runtime. Let me think about that... – David Zemens Jun 22 '13 at 00:37
  • This is tricky... i thought it maybe easier to just parse the table from the page source, but it's not in the page source (table and the download come from a javascript function which is not exposed in the source). I thought [this (kind of simulate a right-click "save target as")](http://social.msdn.microsoft.com/Forums/en-US/beb6fa0e-fbc8-49df-9f2e-30f85d941fad/download-file-from-ie-with-vba) might help but it is not meant for files served from JS like this one. – David Zemens Jun 22 '13 at 13:53
  • so there are two problems: **1)** file is served from javascript and **2)** the `SaveAs` dialog prevents execution of code/run-time; the application is waiting for a response from the user (manually save the file). I think if we can figure out how to change the behavior of #2 then we can use the `HWND` and WinAPI method to automate the save. I will try to look at it again tonight. – David Zemens Jun 22 '13 at 13:55
  • Thanks a lot for your support, the second step (save the file is not a problem), the main problem is to force the code to go to the next line or exit the sub, probably using a time trick I cannot figure out how to do it, I am getting mad :( – Nunzio Puntillo Jun 22 '13 at 19:23
  • I'm pretty much stumped on this... it is possible to save the web page in plain text format (capture the string as `IeDoc.body.innerText` and write it to a plain text file using `FileSystemObject`. The data table is fixed width so it can't be parsed with a simple text-to-columns but it could be parsed with the `.ReadLine` method of a `FileSystemObject`. – David Zemens Jun 23 '13 at 00:18
  • I figured out how to click save on the save dialog so the last step is to "jump" to the next line when the dialog box appear, in the worse case it would be enough to end the sub when the saveas window appear and then run the code to click save from a different spreadsheet ( very crude), in plus i never dealt with text in web pages it would take me a life time anyway, unless you have some code already. Thank a lot – Nunzio Puntillo Jun 23 '13 at 01:18
  • I can put something together tomorrow to parse the text. Should be pretty easy. The table is in s good parseable format. Regex would be best but I don't realkyt know that. I think ordinary string manipulation will be sufficient. – David Zemens Jun 23 '13 at 03:15
  • @ DavideZemens you are a star :) – Nunzio Puntillo Jun 23 '13 at 09:49