Currently I have extracted 13,000 URLs using the code below. However 3,000 of them came up with URLs from Facebook, Bloomberg, and so on. For these URLs I have been manually searching there names and maybe 1 in 20 have a company URL that the macro missed. So my question is this: Is there a way that I can edit the macro so that if a URL page contains a string value such as "facebook" or "wiki" that it will skip that URL and continue to search for a URL that DOES NOT contain the string value?
Code for how I extract URLs:
Sub XMLHTTP()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
Set objH3 = objResultDiv.getelementsbytagname("H3")(0)
Set link = objH3.getelementsbytagname("a")(0)
str_text = Replace(link.innerHTML, "<EM>", "")
str_text = Replace(str_text, "</EM>", "")
Cells(i, 2) = str_text
Cells(i, 3) = link.href
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
This is the code I used to filter out URLs based on string values:
Sub badURLs()
Dim lr As Long ' Declare the variable
lr = Cells(Rows.Count, 3).End(xlUp).Row ' Set the variable
' lr now contains the last used row in column A
Application.ScreenUpdating = False
For a = lr To 1 Step -1
If InStr(1, Cells(a, 3), "bloomberg", vbTextCompare) > 0 _
Or InStr(1, Cells(a, 3), "manta", vbTextCompare) > 0 _
Or InStr(1, Cells(a, 3), "yellowpages", vbTextCompare) > 0 _
Or InStr(1, Cells(a, 3), "yelp", vbTextCompare) > 0 _
Or InStr(1, Cells(a, 3), "snapshot", vbTextCompare) > 0 _
Or InStr(1, Cells(a, 3), "facebook", vbTextCompare) > 0 _
Or InStr(1, Cells(a, 3), "wiki", vbTextCompare) > 0 _
Or InStr(1, Cells(a, 3), "linkedin", vbTextCompare) > 0 _
Or InStr(1, Cells(a, 3), "hoovers", vbTextCompare) > 0 Then
'Compares for bloomberg, wiki, or hoovers. Enters loop if value is greater than 0
With Cells(a, 3)
.NumberFormat = "General"
.Value = "NA"
End With
End If
Next a
Application.ScreenUpdating = True
End Sub
Just to reiterate: I want to know if its possible (and if so how) to filter out URLs in the first macro based on the string values in the second. I'm hoping that this will allow me to have much more accurate URL hits and that I wont have to search 3000 company names manually in hopes that only a few will have a useful URL.