Webscrape loop on all URLs in Column A

Question

I'm trying to scrape the Facebook Video Titles from a list of URL's.

I've got my macro working for a single video in which the URL is built into the code. I'd like the script to instead loop through each URL in Column A and output the Video Title into Column B. Any help?

Current code:

Sub ScrapeVideoTitle()    
    Dim appIE As Object
    Set appIE = CreateObject("internetexplorer.application")

    With appIE
        .navigate "https://www.facebook.com/rankertotalnerd/videos/276505496352731/"
        .Visible = True

        Do While appIE.Busy        
            DoEvents
        Loop

        'Add Video Title to Column B
        Range("B2").Value = appIE.document.getElementsByClassName("_4ik6")(0).innerText

        appIE.Quit
        Set appIE = Nothing
    End With
End Sub

Did you get the video title to the cell working correctly? It looks like you just need to loop through a range, and set that to the `.navigate` value...What have you tried? — BruceWayne, Jul 10 '19 at 02:09
Yes, title working correctly, just cant figure out how to loop through the range... — Nick, Jul 10 '19 at 14:36

QHarr · Answer 1 · 2019-07-11T21:56:59.807

1

Provided you can go VBE > Tools > References > Add a reference to Microsoft HTML Object Library you can do the following:

Read all the urls into an array. Loop the array and use xmlhttp to issue GET request to page. Read the response into an HTMLDocument variable and use css selector to extract the title and store in an array. At the end of the loop write all results out to sheet in one go.

Option Explicit
Public Sub GetTitles()
    Dim urls(), ws As Worksheet, lastRow As Long, results(), i As Long, html As HTMLDocument

    Set html = New HTMLDocument
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    With ws
        lastRow = .Cells(.rows.Count, "A").End(xlUp).Row
        urls = Application.Transpose(.Range("A2:A" & lastRow).Value)
    End With
    ReDim results(1 To UBound(urls))
    With CreateObject("MSXML2.XMLHTTP")
        For i = LBound(urls) To UBound(urls)
            If InStr(urls(i), "http") > 0 Then
                .Open "GET", urls(i), False
                .send
                html.body.innerHTML = .responseText
                results(i) = html.querySelector(".uiHeaderTitle span").innerText
            End If
        Next
    End With
    ws.Cells(2, 2).Resize(UBound(results), 1) = Application.Transpose(results)
End Sub

Matching of css selector to page:

edited Jul 11 '19 at 21:56

answered Jul 11 '19 at 07:41

QHarr

83,427
12
54
101

+1 for your great code. I would like to learn more about querySelector. I have inspected the page and searched for `.uiHeaderTitle span` but I didn't find the title within it!!! – YasserKhalil Jul 11 '19 at 20:47
@YasserKhalil See the _Links I commonly refer people to_ section of my profile page. If I go to the facebook line shown and enter .uiHeaderTitle span in the browser it matches the title. Is that what you tried? I don't know if your page would for some reason be structured differently. – QHarr Jul 11 '19 at 20:49
When I inspected for `.uiHeaderTitle span` I found two results and the inner text for both is not for the title ..! – YasserKhalil Jul 11 '19 at 21:08
I have found the problem. When I use IE for inspecting it works as you explained exactly (I didn't log to FB in IE). But while inspecting the video on chrome I found the results are two and none of them contains the title (In chrome I am logging to my account on FB). – YasserKhalil Jul 12 '19 at 04:40
I tried another link `https://www.facebook.com/846539149041819/videos/422009358641920/` and when inspecting I found the title within the css selector `._5wj- p:first-child` but I got error in the code when using that although when finding it I found 1 match. I am totally confused – YasserKhalil Jul 12 '19 at 04:49
1

The class names are likely dynamic which is why I avoided them. – QHarr Jul 12 '19 at 05:19
Doesn't seem like the OP much cares about a solution anyway. – QHarr Jul 12 '19 at 05:20
Thanks a lot. Can you guide me to the suitable css selector for the video link I posted in my comment? – YasserKhalil Jul 12 '19 at 05:49
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/196354/discussion-between-qharr-and-yasserkhalil). – QHarr Jul 12 '19 at 05:49

score 0 · Answer 2 · answered Jul 10 '19 at 02:21

0

If you had the "276505496352731" part of the url, or indeed the whole URL in olumn A you could set a range to the top value, and then loop until the range was empty, moving it down once for each scrape.

Something like:

'Dims as before
Dim r as range

With appIE

  set r = Range("B1")  ' Assumes B1 is the top of the URL list
  do while r.value > ""

    .navigate r.value
    'Do the rest of your IE stuff
    r.offset(0,1).Value = appIE.document.getElementsByClassName("_4ik6")(0).innerText

    set r = r.offset(1)
  Loop
End With

That should help hopefully.

answered Jul 10 '19 at 02:21

SamP

176
1
10

I keep getting an error, "Run-time error '-2147467259 (80004005)': Method 'Document' of object 'IWebBrowser2' failed". [Screenshot of code and error message](https://imgur.com/a/I5Ml7NV). Any ideas what I'm doing wrong? – Nick Jul 10 '19 at 15:28
OK the error here is in your scraping part I assume, not the Excel loop part. I am unsure as to the specifics of how that works, but try the following: Ensure your `.visible = true` is still in your code. Get yourself an untyped variable `Dim v` and set it to the result of your getElements... call. So `Set v = .document.getElementsByClassName("_4ik6")`. The reason to do this is that it is possible it will not find any elements with that class, and if so it will return an empty node list, and so the (0) index reference to get the first such item will fail. So put the result into v first. – SamP Jul 11 '19 at 03:18

Webscrape loop on all URLs in Column A

2 Answers2

Linked