1

First time post, I apologize if the answer is actually obvious.

I did some searching, and, through this post and some Googling, managed to assemble this VBA macro for Excel:

Sub GetFeats()
  Dim SourceCell As Range
  Dim FeatText As String
  Dim TargetCell As Range
  Dim appIE As Object
  Set appIE = CreateObject("internetexplorer.application")
  Visible = True

  For Each SourceCell In Sheets("Sheet2").Range("A2:A3200")
    With appIE
      .Navigate SourceCell
      .Visible = True
    End With
    Do While appIE.Busy
      DoEvents
    Loop
    FeatText = appIE.document.getElementById("content")
    For Each TargetCell In Sheets("Sheet2").Range("B2:B3200")
    b2 = TargetCell
    Next TargetCell
  Next SourceCell
End Sub

Now, it seems to, well, almost work. I can see Internet Explorer running down the A2:A3200 range, opening up each link in succession. It fails when it gets to an empty cell, so, I'm going to need to search for a way to tell it to skip a cell if the cell is empty, but I think I can manage that on my own.

The problem, is that it's not finding the "content" of the webpage. I'm wondering if maybe div id is not the same, functionally, as tr id (as used in the original post I sourced). I think, if I can get the VBA to find the correct content, it will paste it properly, in range B2:B3200, but right now it just pasting "[object HTMLDivElement]"

For context, here is a sample of one of the links that is being pulled from A2:A3200.

Thank-you all in advance.

1 Answers1

0

Hummm, you kind of lost me. Are you saying you have links in a Column, Like Column A, and you want to import the entire contents of a site to an adjacent cell, like the next cell over in Column B? Is that the ask? If so, try this.

Sub Sample()
Dim ie As Object
Dim retStr As String
Dim sht As Worksheet
Dim LastRow As Long
Dim rCell As Range
Dim rRng As Range

Set sht = ThisWorkbook.Worksheets("Sheet1")

'Ctrl + Shift + End
LastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row

    Set ie = CreateObject("internetexplorer.application")


    Set rRng = Sheet1.Range("A1:A" & LastRow)

    For Each rCell In rRng.Cells

            With ie
                .Navigate rCell.Value
                .Visible = True
            End With

            Do While ie.readystate <> 4: Wait 5: Loop
            DoEvents

            rCell.Offset(0, 1).Value = ie.document.body.innerText
    Next rCell

End Sub

Private Sub Wait(ByVal nSec As Long)
    nSec = nSec + Timer
    While nSec > Timer
        DoEvents
    Wend
End Sub
ASH
  • 20,759
  • 19
  • 87
  • 200
  • Kind of. I have an enormous list of links, which are in Column A. Ideally, I'd like to copy only specific parts of the text of the webpage into various columns (B, C, D, etc), however, I figured it would be easier (for me) to simply get a dump of all the text, and then I can copy all 3200 rows into Word and re-format it there to separate the wheat from the chaff. I do want only the readable text of the website though, not the HTML. – rdplanglois Jul 14 '17 at 19:20
  • OK, that VBA works great, except: it's not pulling the text I want it to. Right now I'm just getting **everything** from the page. Which is fine, I can work with that. Anyway of getting only a specific header from within the page...? – rdplanglois Jul 14 '17 at 19:33
  • Post a few URLs, OK. I'm guessing there are commonalities between the URLs, right. If you have 3200 totally different URLs, this may not be possible. Web sites are generally custom designed and each one is usually quite different from others. – ASH Jul 14 '17 at 20:48
  • I'm sorry to resurrect an old thread. Somehow, I was able to figure this out last year. Damned now if I know what I did. As mentioned in my previous comment, the posted code **does** work; however, it grabs the _entire_ page. Here's a sample link: [link](http://dnd.arkalseif.info/spells/dragons-of-faerun--26/abate-dracorage--1094/). The only part of the page I want is the `div id` "content", itself nested under "site_content" under "main". Any suggestions? – rdplanglois Feb 28 '18 at 23:52