0

I need to extract the information from TE website. The data include the fields and the table shown in the picture.

Website

I read some example code to extract the data from website using VBA in Excel, like this one for example Stackoverflow's Example. I understand some but I cannot find how to adapt into my problem.

There are some comments I found that I will need to know some HTML to do it - but I have no clue how.

I tried to look into the code of the page then I found the information hidden but still hard to have some thoughts about extracting these fields.

<span class="te-search-gallery-family">2.5mm Signal Double Lock Connector</span>
<span class="te-search-gallery-desc">2.5 SIGNAL D/LOCK PLUG HSG 11P</span>

and

  <th class="first-cell"> Product Type</th>
  <th> Connector Type</th>                                          
  <th> Connector Style</th>                                         
  <th> Product Line</th>                                            
  <th> Centerline</th>                                          
  <th> Application Use</th>                                         
  <th> Applies To</th>                                          
  <th> Wire/Cable Type</th>                                         
  <th> Contact Type</th>                                            
  <th class="last-cell"> Number of Positions</th>
</tr>
</thead>
<tbody>
    <tr>
      <td class="first-cell"> Connector</td>
      <td> Housing</td>                                         
      <td> Plug</td>                                            
      <td> 2.5mm Signal Double Lock</td>                                            
      <td> 2.50 mm [0.098 in]</td>                                          
      <td> Wire-to-Wire</td>                                            
      <td> Wire/Cable</td>                                          
      <td> Discrete Wire</td>                                           
      <td> Socket</td>                                          
      <td class="last-cell"> 11</td>
    </tr>
theduck
  • 2,589
  • 13
  • 17
  • 23
user2103670
  • 1,601
  • 10
  • 23
  • 24

1 Answers1

1

Take a look at the code below - tested successfully on my end. It prints all the inforations you are looking for to the debug window - just adapt the code to stick them wherever in your spreadsheet.

You also need to tick references to both

  • the Microsoft HTML Object library
  • Microsoft XML, v6.0

for the code to work

Also, this code should work fine anytime the page returns one single product - but there should be some extra work to be done in the instances where more products are returned Sub xhrsub()

    Dim xhr As MSXML2.XMLHTTP60
    Dim doc As MSHTML.HTMLDocument
    Dim results As MSHTML.HTMLDivElement
    Dim Family As String
    Dim desc As String
    Dim elt As MSHTML.HTMLTableCell
    Dim imgs As MSHTML.IHTMLElementCollection
    Dim img As MSHTML.HTMLImg

    Set xhr = New MSXML2.XMLHTTP60

    With xhr

        .Open "GET", "http://www.te.com/catalog/products/en?q=917695-1", False
        .send

        If .ReadyState = 4 And .Status = 200 Then
            Set doc = New MSHTML.HTMLDocument
            doc.body.innerHTML = .responseText
        End If

    End With

    With doc
        Family = .getElementsByClassName("te-search-gallery-family").Item(0).innerText
        desc = .getElementsByClassName("te-search-gallery-desc").Item(0).innerText
        Set results = .getElementById("te-search-gallery")
    End With

    Debug.Print Family
    Debug.Print desc
    Debug.Print vbNewLine

    With results.getElementsByTagName("table").Item(0)

        For Each elt In .getElementsByTagName("th")
            Debug.Print elt.innerText
        Next elt

        Debug.Print vbNewLine

        For Each elt In .getElementsByTagName("td")
            Debug.Print elt.innerText
        Next elt

    End With

    Set imgs = doc.getElementsByTagName("img")

    For Each img In imgs
        If InStr(img.getAttribute("alt"), "Click here for product details") <> 0 Then
            myurl = img.getAttribute("src")
        End If
    Next img

    Debug.Print myurl

End Sub
IAmDranged
  • 2,890
  • 1
  • 12
  • 6
  • thank you very much. It is much more clear now. I can get individual information as well as the whole table. I have a some problem when trying to get the image url, I tried, `imagelink = .getElementsByTagName("img").Item(0)` and `imagelink = .getElementsByTagName("img").Item(0).innerText` but it is not correct. I also tried `imagelink = .getElementsByTagName("img").Item(0).URL` – user2103670 Jul 31 '14 at 16:50
  • The image url is a bit trickier to get, I have amended the code above to provide an example of how you can get you way around to it. Note the url returned is relative - and also an "about:" substring is prepended to it, not sure why. There is just a bit more to be done to get rid of this and add the root url. Assuming your are able to figure this out - but let me know if you are having any trouble – IAmDranged Jul 31 '14 at 18:17
  • I got it - thanks. :P I use `myurl = Replace(img.getAttribute("src"), "about:/", "http://www.te.com/")` and it worked – user2103670 Jul 31 '14 at 18:40