3

I'm trying to extract a specific link from a website and I'm having trouble pulling into a String.

I have to search about 5000 companies from a website and all of the links vary. A link to the source code of an example company (Nokia) is this: view-source:http://finder.fi/yrityshaku/Nokia+oyj this is the part I'm looking at:

<div class="itemName">

  <!-- Yritysnimi -->

    <!-- Aukeaa aina yhteystiedot-vÃ?lilehdelle -->
    <a href="/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/TAMPERE/yhteystiedot/159838" class="resultGray">

I want to extract the Substring between

  <!-- Yritysnimi -->

    <!-- Aukeaa aina yhteystiedot-vÃ?lilehdelle -->
    <a href="

and

" class="resultGray">

this substring will vary with each company I search and so I will only know what the strings are around the substring I'm trying to extract.

I've tried to use browserIE.Document.body.innerHTML

Sub Macro1()

Set browserIE = CreateObject("InternetExplorer.Application")
browserIE.Top = 0
browserIE.Left = 800
browserIE.Width = 800
browserIE.Height = 1200
browserIE.Visible = True




Set ws = ThisWorkbook.Worksheets("Sheet1")

browserIE.Navigate ("http://www.finder.fi/yrityshaku")
Do
DoEvents
Loop Until browserIE.ReadyState = 4

    browserIE.Document.getElementById("companysearchform_query_companySearchTypename").Click
    browserIE.Document.getElementById("SearchInput").Value = "nokia oyj"
    browserIE.Document.getElementById("SearchSubmit").Click
    Application.Wait (Now + TimeValue("0:00:4"))
    codeArea = Mid(V, InStr(V, "<div class=""itemName""> <!-- Yritysnimi --> <!-- Aukeaa aina yhteystiedot-vÃ?lilehdelle --> <a href="""), Len(V))
    Debug.Print codeArea
    theLink = Mid(codeArea, 117, InStr(codeArea, """ class=""resultGray"">" - 1))

End Sub

but I get an invalid procedure call or argument

I've researched some but I haven't found a suitable solution yet. Some have suggested pulling just an element from the source code and others copying the whole source code into a string variable. As a person who's not too expert in vba I'd prefer pulling the whole code into a string as I think this way would be easier to understand.

Original website (in finnish) http://finder.fi/yrityshaku/nokia+oyj

Joonas
  • 177
  • 1
  • 2
  • 8
  • 2
    try to use a loop to trap both the ready state and busy so not busy and readystate complete. This will leave you to get rid of the 4sec wait. And what is V? – Nathan_Sav Jan 04 '16 at 14:04
  • I was editing the code so I somehow left out v = browserIE.Document.body – Joonas Jan 08 '16 at 13:19

1 Answers1

2

You need to locate all of the <div> elements with a classname of itemName. Loop through those to find the <a> element(s) and use the first one to get the href property.

Sub Macro1()
    Dim browserIE As Object, ws As Worksheet
    Set browserIE = CreateObject("InternetExplorer.Application")
    browserIE.Top = 0
    browserIE.Left = 800
    browserIE.Width = 800
    browserIE.Height = 1200
    browserIE.Visible = True




    Set ws = ThisWorkbook.Worksheets("Sheet1")

    browserIE.Navigate ("http://www.finder.fi/yrityshaku")
    Do While browserIE.ReadyState <> 4 And browserIE.Busy: DoEvents: Loop

    browserIE.Document.getElementById("companysearchform_query_companySearchTypename").Click
    browserIE.Document.getElementById("SearchInput").Value = "nokia oyj"
    browserIE.Document.getElementById("SearchSubmit").Click
    Do While browserIE.ReadyState <> 4 And browserIE.Busy: DoEvents: Loop
    'Application.Wait (Now + TimeValue("0:00:4"))

    Dim iDIV As Long
    With browserIE.Document.body
        If CBool(.getelementsbyclassname("itemName").Length) Then
            'there is at least one div with the itemName class
            For iDIV = 0 To .getelementsbyclassname("itemName").Length - 1
                With .getelementsbyclassname("itemName")(iDIV)
                    If CBool(.getelementsbytagname("a").Length) Then
                        'there is at least one anchor element inside this div
                        Debug.Print .getelementsbytagname("a")(0).href
                    End If
                End With
            Next iDIV
        End If
    End With

End Sub

I added Microsoft HTML Object library and Microsoft Internet controls to the project via the VBE's Tools ► References.

Results from the Immediate window.

http://www.finder.fi/Televiestint%C3%A4laitteita+ja+palveluja/Nokia+Oyj/ESPOO/yhteystiedot/159843
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/SALO/yhteystiedot/960395
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/TAMPERE/yhteystiedot/853264
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/ESPOO/yhteystiedot/2931747
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/ESPOO/yhteystiedot/2931748
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/TAMPERE/yhteystiedot/835172
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/TAMPERE/yhteystiedot/159838
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/SALO/yhteystiedot/159839
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/TAMPERE/yhteystiedot/159850
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/TAMPERE/yhteystiedot/159857
  • I run your code with Microsoft HTML Object library and Microsoft Internet controls added but it doesn't produce anything in the immediate window or anywhere else for that matter. Any idea why? – Joonas Jan 08 '16 at 12:45
  • I tried cheking what the browserIE.Document.body holds, but Debug only shows :[object HTMLBodyElement]. – Joonas Jan 08 '16 at 13:02
  • Got it to work, implemented the Application.Wait (Now + TimeValue("0:00:4")) . Works like a charm! – Joonas Jan 13 '16 at 12:22