1

This topic is related to Loop through links and download PDF's

I am trying to convert my current VBA code into VBScript. I have already understood that I have to remove the variable types (As ... part of Dim statements) and use CreatObject to get those objects but otherwise everything should port as-is. DoEvents will also have to be replaced with something like Wscript.sleep.

I came up with some problems. Currently while running VBS file I am getting an error saying "Object required: 'MSHTML'". Pointing to line 65, where I have Set hDoc = MSHTML.HTMLDocument. I have tried to search on Google but got nothing helpful for this one.

How I should proceed with this one?

DownloadFiles("https://www.nordicwater.com/products/waste-water/")

Sub DownloadFiles(p_sURL)
    Set xHttp = CreateObject("Microsoft.XMLHTTP")
    Dim xHttp 
    Dim hDoc
    Dim Anchors 
    Dim Anchor 
    Dim sPath
    Dim wholeURL

    Dim internet
    Dim internetdata
    Dim internetlink
    Dim internetinnerlink 
    Dim arrLinks 
    Dim sLink 
    Dim iLinkCount 
    Dim iCounter 
    Dim sLinks

    Set internet = CreateObject("InternetExplorer.Application")
    internet.Visible = False
    internet.navigate (p_sURL)

        Do Until internet.ReadyState = 4
        Wscript.Sleep 100
        Loop

        Set internetdata = internet.document
        Set internetlink = internetdata.getElementsByTagName("a")

        i = 1

        For Each internetinnerlink In internetlink
            If Left(internetinnerlink, 36) = "https://www.nordicwater.com/product/" Then

                If sLinks <> "" Then sLinks = sLinks & vbCrLf
                sLinks = sLinks & internetinnerlink.href
                i = i + 1

            Else
            End If

    Next

    wholeURL = "https://www.nordicwater.com/"
    sPath = "C:\temp\"

    arrLinks = Split(sLinks, vbCrLf)
    iLinkCount = UBound(arrLinks) + 1

    For iCounter = 1 To iLinkCount
    sLink = arrLinks(iCounter - 1)
        'Get the directory listing
        xHttp.Open "GET", sLink
        xHttp.send

        'Wait for the page to load
        Do Until xHttp.ReadyState = 4
        Wscript.Sleep 100
        Loop

        'Put the page in an HTML document
        Set hDoc = MSHTML.HTMLDocument
        hDoc.body.innerHTML = xHttp.responseText

        'Loop through the hyperlinks on the directory listing
        Set Anchors = hDoc.getElementsByTagName("a")

        For Each Anchor In Anchors

            'test the pathname to see if it matches your pattern
            If Anchor.pathname Like "*.pdf" Then

                xHttp.Open "GET", wholeURL & Anchor.pathname, False
                xHttp.send

                With CreateObject("Adodb.Stream")
                    .Type = 1
                    .Open
                    .write xHttp.responseBody
                    .SaveToFile sPath & getName(wholeURL & Anchor.pathname), 2 '//overwrite
                End With

            End If

        Next

    Next

End Sub

Function:

Function getName(pf)
    getName = Split(pf, "/")(UBound(Split(pf, "/")))
End Function
user692942
  • 16,398
  • 7
  • 76
  • 175
10101
  • 2,232
  • 3
  • 26
  • 66
  • Please do not change the original posted code as it changes the context of the question and any associated answers. The original issue is `Set hDoc = MSHTML.HTMLDocument` changing that will invalidate the provided answer. Have rolled the question back. – user692942 Nov 12 '19 at 07:03

1 Answers1

3

Instead of Set hDoc = MSHTML.HTMLDocument, use:

Set hDoc = CreateObject("htmlfile")

In VBA/VB6 you can specify variable and object types but not with VBScript. You have to use CreateObject (or GetObject: GetObject function) to instantiate objects like MSHTML.HTMLDocument, Microsoft.XMLHTTP, InternetExplorer.Application, etc instead of declaring those using Dim objIE As InternetExplorer.Application for example.

Another change:

If Anchor.pathname Like "*.pdf" Then

can be written using StrComp function:

If StrComp(Right(Anchor.pathname, 4), ".pdf", vbTextCompare) = 0 Then

or using InStr function:

If InStr(Anchor.pathname, ".pdf") > 0 Then

Also, at the beginning of your sub, you do the following:

Set xHttp = CreateObject("Microsoft.XMLHTTP")
Dim xHttp 

You should declare your variables before assigning them values or objects. In VBScript this is very relaxed, your code will work because VBScript will create undefined variables for you but it's good practice to Dim your variables before using them.

Except for Wscript.sleep commands, your VBScript code will work in VB6/VBA so you can debug your script in VB6 or VBA apps (like Excel).

Étienne Laneville
  • 4,697
  • 5
  • 13
  • 29
  • After using `Set hDoc = CreateObject("MSHTML.HTMLDocument")` I am getting another error. See edited post. – 10101 Nov 11 '19 at 23:09
  • Sorry, try `CreateObject("htmlfile")` instead! – Étienne Laneville Nov 11 '19 at 23:19
  • Yes, I have already got this one, but then there are some other problems on the next lines... `hDoc.body.innerHTML = xHttp.responseText` – 10101 Nov 11 '19 at 23:21
  • According to this one https://stackoverflow.com/questions/9995257/mshtml-createdocumentfromstring-instead-of-createdocumentfromurl/20483982 `hDoc.body.innerHTML = xHttp.responseText` should work? – 10101 Nov 11 '19 at 23:25
  • The accepted answer in that question uses `open`, then `write "

    In his house at R'lyeh, dead Cthulhu waits dreaming

    "` and finally `Close`. You can replace `odoc` with `hDoc` and try it out.
    – Étienne Laneville Nov 11 '19 at 23:31
  • That's one of a task... I have used this one https://stackoverflow.com/questions/53938629/cannot-call-the-body-property-for-htmlfile-object-in-vbscript Now it goes until line 78 `If Anchor.pathname Like "*.pdf" Then` Sub or function not defined – 10101 Nov 11 '19 at 23:35
  • Also why do I need this line at all? `hDoc.body.innerHTML = xHttp.responseText` I am not using it anywhere in my code later on? By commenting it out I don't get any errors but also no files in folder – 10101 Nov 11 '19 at 23:40
  • Ok, now it seems to work =O !!! However I dont get it, why I need these lines ` hDoc.open hDoc.close hDoc.body.innerHTML = "" hDoc.body.innerHTML = xHttp.responseText` – 10101 Nov 11 '19 at 23:43
  • 1
    Those lines are very important!! They load the content of the webpage into the HTML document object. Without this your Anchors collection will be empty. – Étienne Laneville Nov 11 '19 at 23:44
  • I have posted working code just in case! One more target achieved =) BTW this website is just an example I have found in the internet but I will try to get it working also on the real one. – 10101 Nov 11 '19 at 23:46
  • Automating things is very cool! Thank you a million for your help! – 10101 Nov 11 '19 at 23:49