0

I have a problem with this code. I have to make a macro that detects the most repeated words in a domain for example cats is repeated 1180 times in the domain www.love_cats.com (example). I plan to do it with the google API that counts the total results as in the image

enter image description here

The code that I have is.

Sub diccionarios()

    screenUpdateStatus = Application.ScreenUpdating
    statusBarStatus = Application.DisplayStatusBar
    calcStatus = Application.Calculation
    eventsStatus = Application.EnableEvents
    displayPageBreakStatus = ActiveSheet.DisplayPageBreaks
    Application.ScreenUpdating = False
    Application.DisplayStatusBar = False
    Application.Calculation = xlCalculationManual
    Application.EnableEvents = False
    ActiveSheet.DisplayPageBreaks = False
    Application.DisplayAlerts = True

Dim dicc As New Scripting.Dictionary
Dim columnaSalida As Integer, k As Variant, filaSalida As Long,         
grantotalPalabras As Long
Dim palabra As String, rango As Range, ultima As Variant, celda As Variant,     
contador As Long

ultima = Sheets("Sheet1").Range("A1").End(xlDown).Row

Set rango = Range("A1:A" & ultima)

Set dicc = New Scripting.Dictionary

For Each celda In rango
palabra = celda

If Not dicc.Exists(palabra) Then
    dicc.Item(palabra) = 1
Else
    dicc.Item(palabra) = dicc.Item(palabra) + 1
End If

Next celda

filaSalida = 1
columnaSalida = 2
contador = 1
For Each k In dicc.Keys

Dim ie As Object, form As Variant, button As Variant, _
LR As Integer, var As String, var1 As Object

var = k & " site :www.domea.dk"
Set ie = CreateObject("internetexplorer.application")
ie.Visible = False

    With ie

        .Visible = False
        .navigate "https://www.google.co.in/?gws_rd=ssl"

    End With

    Application.Wait (Now + TimeValue("0:00:02"))
    ie.document.getElementById("lst-ib").Value = var

     Set form = ie.document.getElementsByTagName("form")
     Application.Wait (Now + TimeValue("0:00:02"))
     Set button = form(0).onsubmit

     form(0).submit

    Application.Wait (Now + TimeValue("0:00:02"))

    Set var1 = ie.document.getElementById("resultStats")

    Cells(contador, 2).Value = var1.innerText

    ie.Quit
    Set ie = Nothing
    contador = contador + 1
Next k

Set dicc = Nothing

Application.DisplayAlerts = True
Application.EnableEvents = True
Application.ScreenUpdating = screenUpdateState
Application.DisplayStatusBar = True
Application.Calculation = calcState
Application.EnableEvents = eventsState
ActiveSheet.DisplayPageBreaks = displayPageBreaksState

End Sub

Then it occurred to me to use a dictionary, where I put all the possible words (58 thousand words) and go through them, It count the words and the first 5 would be the ones that are shown. The problem is that the macro takes a long time (15 minutos), becomes saturated and fails.

Anthony
  • 3,595
  • 2
  • 29
  • 38
Ivan HH
  • 29
  • 4
  • Dictionaries not sorted by key or item but you should be able to retrieve the top 5 after loading it. –  Jul 21 '18 at 02:54
  • You're creating an Internet Explorer object (`CreateObject("internetexplorer.application")`) with every loop. Move this outside the loop because you only need it once. Also, you're telling the code to take a minimum of six seconds with three different `Application.Wait` each loop. If you have just 150 words in your dictionary, then you're waiting 6*150=900 seconds (15 minutes). Perhaps there is a way to cut down or eliminate these waits? – PeterT Jul 21 '18 at 03:14
  • I can not delete the "Wait" when you delete them. I make many mistakes. It has to do with the server's responses. By the way, your solution was very good. I do not know how I did not think that before. Drastically low CPU consumption. But it's still taking a long time. – Ivan HH Jul 21 '18 at 03:47
  • This line sometimes works and sometimes it marks me wrong IE.document.getElementById("lst-ib").Value = var – Ivan HH Jul 21 '18 at 03:49
  • And is http://www.love_cats.com/ a real url? – QHarr Jul 21 '18 at 09:21

1 Answers1

1

tl;dr;

It looks like you want to use a grid to generate combinations of Google (it looks like Google; that is an assumption that should be clarified) search terms e.g. København + Domea.dk ; and the expected result (for you) was 604 google hits for that search combination. To then compare that with the hits for Kontakt + Domea.dk etc. But then your image, though hard to read, clearly has more search terms. And of course, none of this appears to be about the number of times words appear, but simply the current page hit count for a given combination of search terms, which include a domain name.

The following will get the hit counts for the grid combinations you show:


XHR:

Option Explicit
Public Sub GetHits()
    Dim searchGrid(), i As Long, j As Long, ie As New InternetExplorer, results As Worksheet
    Set results = ThisWorkbook.Worksheets("SearchTermGrid")
    searchGrid = results.Range("A1:C4").Value

    For i = 2 To UBound(searchGrid, 1)
        For j = 2 To UBound(searchGrid, 2)
            searchGrid(i, j) = GetPageHits(searchGrid(i, 1) & Chr$(32) & searchGrid(1, j))
        Next j
    Next i
    results.Range("A1").Resize(UBound(searchGrid, 1), UBound(searchGrid, 2)) = searchGrid
End Sub

Public Function GetPageHits(ByVal searchTerm As String) As Long
    Dim sResponse As String
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.google.co.uk/search?safe=strict&ei=Ie9SW77uDumbgAaR3IW4DA&q=" & URLEncode(searchTerm) & "&oq=Domea.dk+&gs_l=psy-ab.1.0.35i39k1j0i30k1l9.1154157.1158758.0.1159742.4.2.2.0.0.0.122.209.1j1.2.0....0...1c.1.64.psy-ab..0.4.215...0.0.fep2Oxcc7lY", False
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    GetPageHits = Split(Split(sResponse, "id=""resultStats"">")(1), Chr$(32))(1)
End Function

Public Function URLEncode( _
       StringVal As String, _
       Optional SpaceAsPlus As Boolean = False _
       ) As String

    Dim StringLen As Long: StringLen = Len(StringVal)

    If StringLen > 0 Then
        ReDim result(StringLen) As String
        Dim i As Long, CharCode As Integer
        Dim Char As String, Space As String

        If SpaceAsPlus Then Space = "+" Else Space = "%20"

        For i = 1 To StringLen
            Char = Mid$(StringVal, i, 1)
            CharCode = Asc(Char)
            Select Case CharCode
            Case 97 To 122, 65 To 90, 48 To 57, 45, 46, 95, 126
                result(i) = Char
            Case 32
                result(i) = Space
            Case 0 To 15
                result(i) = "%0" & Hex(CharCode)
            Case Else
                result(i) = "%" & Hex(CharCode)
            End Select
        Next i
        URLEncode = Join(result, "")
    End If
End Function

Internet Explorer:

Option Explicit
Public Sub GetPageHits()
    Dim searchGrid(), i As Long, j As Long, ie As New InternetExplorer, results As Worksheet
    Set results = ThisWorkbook.Worksheets("SearchTermGrid")
    searchGrid = results.Range("A1:C4").Value

    With ie
        .Visible = True
        .navigate "https://www.google.co.uk/"
        While .Busy Or .readyState < 4: DoEvents: Wend

        For i = 2 To UBound(searchGrid, 1)
            For j = 2 To UBound(searchGrid, 2)
                searchGrid(i, j) = GetHitResults(searchGrid(i, 1) & Chr$(32) & searchGrid(1, j), ie)
            Next j
        Next i
        .Quit
    End With
    results.Range("A1").Resize(UBound(searchGrid, 1), UBound(searchGrid, 2)) = searchGrid
End Sub

Public Function GetHitResults(ByVal searchTerm As String, ByVal ie As Object) As Long
    With ie
        .document.getElementById("lst-ib").Value = searchTerm     
        .document.forms(0).submit
        While .Busy Or .readyState < 4: DoEvents: Wend
        GetHitResults = Split(Split(.document.body.outerHTML, "id=""resultStats"">")(1), Chr$(32))(1)
    End With
End Function

Results:

Results


References:

  1. URL Encode function by @Tomalak

Project references:

  1. Microsoft Internet Control via Tools > References.
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Thanks for that. But it marks me the error in .send It tells me "Access Denied" – Ivan HH Jul 21 '18 at 19:37
  • Only in the first one The second one marks me "Type Mismacht" in the line: GetHitResults = Split (Split (.document.body.innerHTML, "id =" "resultStats" ">") (1), Chr $ (32)) (1) – Ivan HH Jul 21 '18 at 20:05
  • Yes, actually. The terms must be random – Ivan HH Jul 21 '18 at 21:19
  • The value is Magasinet – Ivan HH Jul 22 '18 at 00:40
  • The words searched are in Danish – Ivan HH Jul 22 '18 at 01:38
  • No, lo que estoy buscando es cuántas veces se repite una palabra en un dominio, la búsqueda en google sería asi Magasinet site: domea.dk – Ivan HH Jul 22 '18 at 14:25
  • Algo así https://www.google.com.mx/search?rlz=1C1GCEA_enMX791MX791&ei=wpdUW4KsNs_KswWDwr3ICw&q=google+site%3Agoogle.com&oq=google+site%3Agoogle.com&gs_l=psy-ab.3...557.8011.0.8272.24.20.2.0.0.0.1108.2834.3j5j3j7-1.12.0....0...1c.1.64.psy-ab..10.7.1125...0j35i39k1j0i20i263k1j0i67k1j0i131k1j0i203k1j0i10k1.0.4lR-Qlqpv0s&safe=active&ssui=on – Ivan HH Jul 22 '18 at 14:42
  • Sub test() : Debug.Print GetPageHits("Magasinet site: domea.dk") : End Sub '> Gives me 16600 which is the correct results number. I am guessing you are somehow blocked from making calls via XHR (perhaps due to too many calls with a short period?) – QHarr Jul 22 '18 at 16:29
  • Surely that's it. But what could I do? – Ivan HH Jul 22 '18 at 16:49
  • Me marca "Type mismatch" en la linea GetHitResults = Split(Split(.document.body.outerHTML, "id=""resultStats"">")(1), Chr$(32))(1) – Ivan HH Jul 22 '18 at 21:26
  • ¿no has cambiado nada? – QHarr Jul 22 '18 at 21:29