0

Can't figure out how can I get all the company links from the page used in my code. Running my script I get only 20 links. The page has got lazyloading method that is why I can't get all of them. Any input on this will be highly appreciated. I've tried so far with:

Sub Company_links()
Const lnk = "http://fortune.com"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim topic As Object

With http
    .Open "GET", "http://fortune.com/fortune500/list/", False
    .send
    html.body.innerHTML = .responseText
End With

For Each topic In html.getElementsByClassName("small-12 column row")
    x = x + 1
    With topic.getElementsByTagName("a")
        If .Length Then Cells(x, 1) = lnk & Split(.item(0).href, "about:")(1)
    End With
Next topic

Set html = Nothing: Set topics = Nothing
End Sub
SIM
  • 21,997
  • 5
  • 37
  • 109
  • If the site is using ajax to load the remaining links. You would need to make the page load those remmaining links first. – Max08 Jun 16 '17 at 20:01

3 Answers3

1

Run the following code in a new workbook. It will output to results to Sheet1 regardless of they are empty or not, so be careful if you have data there. You can change this part of code later as you like.

First of all you need to activate Microsoft HTML Object Library and Microsoft Internet Controls from Tools -> References in VBA Editor. Then run the following code, sit back and relax until you see "All Done!" message:

Sub Company_links()
    Dim i As Long
    Dim aIE As InternetExplorer
    Dim Rank As IHTMLElement, Company As IHTMLElement, Revenues As IHTMLElement
    Set aIE = New InternetExplorer
    With aIE
        .navigate "http://fortune.com/fortune500/list/"
        .Visible = True
    End With

    Do While (aIE.Busy Or aIE.ReadyState <> READYSTATE_COMPLETE)
        DoEvents
    Loop

    For i = 1 To 50

        On Error Resume Next
        Set Rank = aIE.document.getElementsByClassName("column small-2 company-rank")(999)
        If Rank Is Nothing Then
            GoTo Skip
        End If
        Exit For
Skip:
    SendKeys "{end}"
    Application.Wait (Now() + TimeValue("00:00:005"))
    Next i

    With Sheet1
        .Range("A1") = "RANK"
        .Range("B1") = "COMPANY"
        .Range("C1") = "REVENUE"

        For i = 0 To 999
            Set Rank = aIE.document.getElementsByClassName("column small-2 company-rank")(i)
            Set Company = aIE.document.getElementsByClassName("column small-5 company-title")(i)
            Set Revenues = aIE.document.getElementsByClassName("column small-5 company-revenue")(i)
            .Range("A" & i + 2) = Rank.innerText
            .Range("B" & i + 2) = Company.innerText
            .Range("C" & i + 2) = Revenues.innerText
        Next i

    End With

    SendKeys "%{F4}"
    Set aIE = Nothing
    Set Rank = Nothing
    Set Company = Nothing
    Set Revenues= Nothing
    MsgBox "All Done!"
End Sub
Tehscript
  • 2,556
  • 2
  • 13
  • 23
  • Thanks Tehscript, for your answer. Gonna get back to you when I'm done. Btw, you have always been a great help to me. – SIM Jun 16 '17 at 21:20
  • @SMth80 No problem, but I have just noticed that you want the company links. Can you manage to do that altering this code? Because this script gives the Rank, Company Name and Revenue. – Tehscript Jun 16 '17 at 21:21
  • No problem, I'll manage that. Your code works as usual. Now I'm gonna give a little twitch to make it with xmlhttp as I have got an idea from you. Thanks a ton. – SIM Jun 16 '17 at 21:35
  • @SMth80 You are welcome. Let me know if you get stuck. – Tehscript Jun 16 '17 at 21:43
  • Hi Tehscript, In your spare time, you can check out the link. "https://stackoverflow.com/questions/45200247/web-scraping-across-multipages-without-even-knowing-the-last-page-number" – SIM Jul 19 '17 at 20:23
0

If the site is using ajax to load the remaining links. You would need to make the page load those remaining links first. My suggestion would be to use selenium to load the page and then use your code to get the links.

http://selenium-python.readthedocs.io/

Max08
  • 955
  • 1
  • 7
  • 16
  • Sorry bro ANKIT GAUR. I tried with selenium. That couldn't bring all the links either. That brought also 20 links as I was having with my first code. The problem is elsewhere. There must be a pagination option (which usually does have) in the link but can't figure out how that has been placed. – SIM Jun 16 '17 at 21:12
0

I would do it like this.

Option Explicit

Sub Sample()
    Dim ie As Object
    Dim links As Variant, lnk As Variant
    Dim rowcount As Long

    Set ie = CreateObject("InternetExplorer.Application")
    ie.Visible = True
    ie.navigate "http://fortune.com"

    'Wait for site to fully load
    'ie.Navigate2 URL
    Do While ie.Busy = True
       DoEvents
    Loop

    Set links = ie.document.getElementsByTagName("a")

    rowcount = 1

    With Sheets("Sheet1")
        For Each lnk In links
        'Debug.Print lnk.innerText
            'If lnk.classname Like "*Real Statistics Examples Part 1*" Then
                .Range("A" & rowcount) = lnk.innerText
                rowcount = rowcount + 1
                'Exit For
            'End If
        Next
    End With
End Sub
ASH
  • 20,759
  • 19
  • 87
  • 200