1

I'm trying to extract data into Excel (2007) from one specific website, but spread over multiple webpages. What I want to see in my sheet is which items are offered on this website, without going through many pages or using the search (it's a bit buggy in my browser).

I have tried importing the data via Excel, but that only works for one page. Because the data is covered over 183 pages, I must do it 183 times to complete.

My guess is that it will work faster with a macro, but I have no experience with that. I did a search on this forum, but al the macro's I did find would either give an error or did the work for one page only.

The link to the website in question is http://www.scalemodelstore.nl/modellen/2/Vliegtuigen.html?&pageID=0

The only change in the link is at the end: pageID=1, 2, and so on.

Thanks in advance!

pnuts
  • 58,317
  • 11
  • 87
  • 139
wagglywheel
  • 23
  • 1
  • 5

1 Answers1

2

Try this one:

Sub GetData()

    Dim lRow, lPage, oXmlHttp, sResp, aResp, sPart, oHtmlFile, oBody, sInText, aInLines, lCol, sLineText, aImgPts

    lRow = 1
    lPage = 0
    Do
        sUrl = "http://www.scalemodelstore.nl/modellen/2/Vliegtuigen.html?&pageID=" & lPage
        Do
            Set oXmlHttp = CreateObject("MSXML2.XMLHttp")
            oXmlHttp.Open "GET", sUrl, True
            oXmlHttp.Send
            Do Until oXmlHttp.ReadyState = 4
                DoEvents
            Loop
            sResp = oXmlHttp.ResponseText
        Loop While sResp = ""
        aResp = Split(sResp, "<a class=""productTile"" ")
        For i = 1 To UBound(aResp)
            sPart = "<a " & aResp(i)
            sPart = Split(sPart, "</a>")(0)
            Set oHtmlFile = CreateObject("htmlfile")
            oHtmlFile.Write sPart
            Set oBody = oHtmlFile.GetElementsByTagName("body")(0)
            sInText = Trim(oBody.InnerText)
            aInLines = Split(sInText, vbCrLf)
            lCol = 1
            For Each sLineText In aInLines
                sLineText = Trim(sLineText)
                If sLineText <> "" Then
                    Cells(lRow, lCol).Value = sLineText
                    lCol = lCol + 1
                End If
            Next
            aImgPts = Split(sPart, "<img src=""")
            If UBound(aImgPts) > 0 Then
                Cells(lRow, lCol).Value = Split(aImgPts(1), """")(0)
            End If
            lRow = lRow + 1
        Next
        lPage = lPage + 1
    Loop Until UBound(aResp) = 0

End Sub

This code just gets all available data for each model on all webpages and put it into worksheet, one row for each model. Note, it is not a one-stop solution, the code works now, but may become faulty as soon as the website content changed.

omegastripes
  • 12,351
  • 4
  • 45
  • 96
  • Seems to work perfectly, thank you! Is there any specific part that needs to be changed when the website changes, or would I need a whole new macro? – wagglywheel Sep 13 '14 at 13:15
  • First of all in that case try to find appropriate 2nd parameter string value for ` Split() ` functions, all parsing logic is based on it. Currently if you open above mentioned webpages html code you can find strings from code above within webpage content, e. g. each block with model description begins from `
    – omegastripes Sep 13 '14 at 15:40