7

I've written a script in vba to scrape some items from a webpage. By default, there are 4/5 items visible when I open that page. However, the webpage displays it's rest of the items when it is made to scroll downward. If it were not for split screen, I could have handled it using .parentWindow.scrollBy. As I do not have any idea how to scroll any partial screen of a webpage, I get stuck. Any help on this will be highly appreciated.

Link to the webpage: Page_Link

This is what I have written so far:

Sub Get_Result()
    Const URL As String = "replace with above link"
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim storage As Object, posts As Object, post As Object

    With IE
        .Visible = True
        .navigate URL
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Set html = .document
    End With

    For i = 1 To 5
        Set storage = html.getElementsByClassName("sc-jqCOkK gkztbk")
        html.parentWindow.scrollBy 0, 100
        'SendKeys "{end}"
        Application.Wait Now + TimeValue("00:00:003")
    Next i

    For Each posts In storage
            row = row + 1: Cells(row + 1, 1) = posts.querySelector("[id='app.components.HouseCard.rentLine']").innerText
    Next posts
    IE.Quit
End Sub

Btw, this is how that split window look like: enter image description here

Post script: I'm only after any solution regarding scrolling a split screen like it is visible above.

SIM
  • 21,997
  • 5
  • 37
  • 109
  • Find the scroll-able container and sets `scrollTop` to a max int32. In your case: `html.querySelector("#app [style*=scroll]").scrollTop = &h7fffffff` – Florent B. Jan 18 '18 at 14:24
  • This CodeReview [Handling lazy-loading webpages using vba](https://codereview.stackexchange.com/questions/183215/handling-lazy-loading-webpages-using-vba/183242#183242) may help you. You need to find the class name of the posts. – PatricK Jan 22 '18 at 00:36

2 Answers2

6

For the scrolling of the left window use the div with class name sc-iqzUVk gZZIKO and set scrollTop to some value, e.g. use scrollHeight to scroll to the bottom like this leftWindowDiv.scrollTop = leftWindowDiv.scrollHeight.

Complete example:

Dim leftWindowColl As IHTMLElementCollection
Set leftWindowColl = html.getElementsByClassName("sc-iqzUVk gZZIKO")

If leftWindowColl.Length > 0 Then
    Dim leftWindowDiv As HTMLDivElement
    Set leftWindowDiv = leftWindowColl.Item(0)
    leftWindowDiv.scrollTop = leftWindowDiv.scrollHeight
End If

enter image description here

Because the class names seem to be generated (names like sc-iqzUVk gZZIKO look like generated) it could be difficult to find the appropriate div based on class name because the generated name will change. That is probably why you couldn't perform the scroll as you write in your comment. This can be solved by using the querySelector method as already suggested by @Florent B. More about selectors.

This selector selectes all divs inside element with id app which have style attribute ending with scroll;.

Dim selector As String
selector = "#app div[style$='scroll;']" 

Dim leftWindowDiv As HTMLDivElement
Set leftWindowDiv = html.querySelector(selector)
leftWindowDiv.scrollTop = leftWindowDiv.scrollHeight
Daniel Dušek
  • 13,683
  • 5
  • 36
  • 51
-1

Because the data is actually being loaded in the page via JavaScript (AJAX request), it might be easier to go directly to the source instead of scraping the data from the page... then you don't have to try to trigger the data refresh by scrolling.

E.g. This request...

https://www.quintoandar.com.br/api/search?q=(and%20vagas:%271%27(and%20area:[20,75](and%20custo:[500,4400]tipo:%27Apartamento%27)))&fq=local:[%27-23.442839332881825,-46.78368476621091%27,%27-23.658112332905105,-46.67382148496091%27]&return=id,foto_capa,aluguel,area,quartos,custo,photos,endereco,regiao_nome&size=12&q.parser=structured&expr.distance=floor(haversin(-23.550475832893465,-46.72875312558591,local.latitude,local.longitude))&expr.rank=relevance_score-(10000*distance)&sort=rank%20desc&start=12

...is being made behind the scenes and returns a JSON data packet with the matching data:

{"status":{"rid":"saDi7Y8s0eHFAgq6wiU=","time-ms":11},"hits":{"found":306,"start":12,"hit":[{"id":"892805064","fields":{"quartos":"2","area":"70","id":"892805064","aluguel":"2096","foto_capa":"capa892805064283_95250575977DSC0029.jpg","photos":["892805064-283.95250575977DSC0029.jpg","892805064-40.179547677396954DSC0030.jpg","892805064-284.5837713474729DSC0031.jpg","892805064-902.2967820247848DSC0032.jpg","892805064-529.554508221606DSC0033.jpg"],"custo":"2528","endereco":"Avenida Diógenes Ribeiro de Lima","regiao_nome":"Alto da Lapa"}},{"id":"892815116","fields":{"quartos":"2","area":"60","id":"892815116","aluguel":"2850","foto_capa":"capa892815116261_74899351456617DSC4227.JPG","photos":["892815116-261.74899351456617DSC4227.JPG","892815116-322.2261964874089DSC4229.JPG","892815116-797.408104831225DSC4230.JPG","892815116-755.9750463973334DSC4231.JPG","892815116-335.025196990966DSC4232.JPG"],"custo":"3476","endereco":"Rua Carlos Weber","regiao_nome":"Vila Leopoldina"}},{"id":"892814441","fields":{"quartos":"2","area":"56","id":"892814441","aluguel":"2300","foto_capa":"capa892814441438_88057915637637DSC2877.JPG","photos":["892814441-438.88057915637637DSC2877.JPG","892814441-372.04023616269046DSC2879.JPG","892814441-544.1457767902004DSC2880.JPG","892814441-696.1076611567722DSC2881.JPG","892814441-658.147344161576DSC2882.JPG"],"custo":"3053","endereco":"Rua Schilling","regiao_nome":"Vila Leopoldina"}},{"id":"892801135","fields":{"quartos":"2","area":"74","id":"892801135","aluguel":"2100","foto_capa":"capa892801135154_64867005243553DSC0131.jpg","photos":["892801135-154.64867005243553DSC0131.jpg","892801135-821.0691885230938DSC0132.jpg","892801135-550.173130870861DSC0133.jpg","892801135-781.6859738106783DSC0134.jpg","892801135-324.15549348830274DSC0135.jpg"],"custo":"2962","endereco":"Rua Cerro Corá","regiao_nome":"Vila Romana"}},{"id":"892810991","fields":{"quartos":"1","area":"44","id":"892810991","aluguel":"1700","foto_capa":"capa892810991346_72268700275674DSC7435.JPG","photos":["892810991-346.72268700275674DSC7435.JPG","892810991-760.5384685893431DSC7437.JPG","892810991-493.4367707205927DSC7438.JPG","892810991-148.3627701115735DSC7439.JPG","892810991-879.5681722331702DSC7440.JPG"],"custo":"2290","endereco":"Rua Filipinas","regiao_nome":"Alto da Lapa"}},{"id":"892785444","fields":{"quartos":"3","area":"72","id":"892785444","aluguel":"1750","foto_capa":"capa892785444176_87927661079405DSC0078cpia.jpg","photos":["892785444-176.87927661079405DSC0078cpia.jpg","892785444-155.2841769051142DSC0079cpia.jpg","892785444-264.8248449740238DSC0080cpia.jpg","892785444-271.50916660357717DSC0081cpia.jpg","892785444-73.63648677390611DSC0082cpia.jpg"],"custo":"2596","endereco":"Rua Aparande","regiao_nome":"Vila Romana"}},{"id":"892815084","fields":{"quartos":"2","area":"65","id":"892815084","aluguel":"1400","foto_capa":"capa892815084151_12980286342614DSC0648.JPG","photos":["892815084-151.12980286342614DSC0648.JPG","892815084-366.4192232225364DSC0646.JPG","892815084-323.49030789818755DSC0647.JPG","892815084-57.91456171751297DSC0650.JPG","892815084-823.6616866177429DSC0651.JPG"],"custo":"1702","endereco":"Rua Professora Gioconda Mussolini","regiao_nome":"Butantã"}},{"id":"892815480","fields":{"quartos":"2","area":"66","id":"892815480","aluguel":"1517","foto_capa":"capa892815480827_4216371362361MSA7050.jpg","photos":["892815480-827.4216371362361MSA7050.jpg","892815480-168.3833393977906MSA7051.jpg","892815480-628.7954363215526MSA7052.jpg","892815480-422.0828651677334MSA7053.jpg","892815480-145.77827866297056MSA7055.jpg"],"custo":"2537","endereco":"Avenida Diógenes Ribeiro de Lima","regiao_nome":"Alto da Lapa"}},{"id":"892803901","fields":{"quartos":"2","area":"62","id":"892803901","aluguel":"2100","foto_capa":"capa892803901461_71112127633575DSC0020.jpg","photos":["892803901-461.71112127633575DSC0020.jpg","892803901-957.5988374372433DSC0019.jpg","892803901-451.9960408234817DSC0021.jpg","892803901-901.8494079400576DSC0022.jpg","892803901-246.9476745586001DSC0023.jpg"],"custo":"2522","endereco":"Avenida Manoel Pedro Pimentel","regiao_nome":"Centro"}},{"id":"892814536","fields":{"quartos":"1","area":"43","id":"892814536","aluguel":"3300","foto_capa":"capa892814536235_068855277688165andar1.JPG","photos":["892814536-235.068855277688165andar1.JPG","892814536-119.102637766625555andar2.JPG","892814536-597.02840920644885andar3.JPG","892814536-86.43842879346635andar6.JPG","892814536-748.15126030809735andar4.JPG"],"custo":"3981","endereco":"Rua Heitor Penteado","regiao_nome":"Vila Romana"}},{"id":"892817054","fields":{"quartos":"2","area":"57","id":"892817054","aluguel":"2300","foto_capa":"capa892817054867_6528077345225MSA0543.jpg","photos":["892817054-867.6528077345225MSA0543.jpg","892817054-845.6187393316951MSA0542.jpg","892817054-580.6323230425363MSA0544.jpg","892817054-385.0217861332558MSA0545.jpg","892817054-638.0684371185209MSA0546.jpg"],"custo":"3246","endereco":"Rua Jericó","regiao_nome":"Vila Madalena"}},{"id":"892801625","fields":{"quartos":"2","area":"66","id":"892801625","aluguel":"2500","foto_capa":"capa892801625221_0775662268666THI0215.jpg","photos":["892801625-221.0775662268666THI0215.jpg","892801625-506.8116885746092THI0228.jpg","892801625-562.32705863531THI0218.jpg","892801625-194.33328511837655THI0219.jpg","892801625-980.9227287984841THI0220.jpg"],"custo":"3394","endereco":"Rua Guaricanga","regiao_nome":"Lapa"}}]}}

You can then use a JSON parser for VBA to process the raw data rather than having to scrape the HTML and find what you're looking for.

Nerdwood
  • 3,947
  • 1
  • 21
  • 20
  • 1
    OK, but isn't the purpose of your VBA code to extract the contents of the `` which has the property name & price? Those pieces of data would be directly given to you via the JSON response, which means you don't need to pretend to be a web browser that's scrolling... – Nerdwood Jan 16 '18 at 07:07
  • 2
    I don't really think this answer should be down-voted... it's an alternative way to pull out the data, seeing as though the purpose of the script is to scrape webpage data. – Nerdwood Jan 22 '18 at 03:36