Extract content of div from Google Translate with VBA

Question

I have a function to translate the language:

Public Function Translate(rng As Range, Optional translateFrom As String = "nl", Optional translateTo As String = "en")
    Dim getParam As String, Trans As String, objHTTP As Object, URL As String
    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    getParam = ConvertToGet(rng.Value)
    URL = "https://translate.google.com/m?hl=" & translateFrom & "&sl=" & translateFrom & "&tl=" & translateTo & "&ie=UTF-8&prev=_m&q=" & getParam
    objHTTP.Open "GET", URL, False
    objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    objHTTP.send ("")
    If InStr(objHTTP.responseText, "div dir=""ltr""") > 0 Then
        Trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>")
        Translate = CleanA(Trans)
    Else
        Translate = CVErr(xlErrValue)
    End If
End Function

with Chinese translation:

A1 = Hello

B1 = Translate(A1,"en","zh-cn")

The result is "Nǐ hǎo", the correct result is "你好"

Link google: https://translate.google.com/m?hl=en&sl=en&tl=zh-CN&ie=UTF-8&prev=_m&q=hello

I want results:

B1 = 你好

C1 = Nǐ hǎo

I think I need to fix this code:

Trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>")

please help me, thank you!

Robin Mackenzie · Answer 1 · 2020-12-23T10:53:17.303

The page returns two <div>s like:

<div dir="ltr" class="o1">Nǐ hǎo</div>
<div dir="ltr" class="t0">你好</div>

You are well advised to not try and parse HTML with regular expressions as it is very easy to run into difficulties with parsing the HTML - you can use the Microsoft HTML Object Library from within VBA to obtain similar results.

To get the content of those two <div> tags you can use this code per the example below:

' o1 has Anglicised translation, t0 as tranlsation in target language
Set objDivs = objHTML.getElementsByTagName("div")
For Each objDiv In objDivs
    If objDiv.className = "o1" Then
        strTranslatedO1 = objDiv.innerText
    End If
    If objDiv.className = "t0" Then
        strTranslatedT0 = objDiv.innerText
    End If
Next objDiv

Which is basically looping through all the <div> tags in the returned HTML and checking for the class names o1 and t0 and then getting the innerText property. Using this technique you can obtain the translated values and write them back to the worksheet e.g.:

Full code:

Option Explicit

Public Sub Test()

    Dim ws As Worksheet
    
    ' testing worksheet
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    ws.Cells.Delete
    
    ' test inputs
    ws.Range("A1:E1") = Array("Input", "From", "To", "T0", "O1")
    ws.Range("A2:A4") = "hello"
    ws.Range("B2:B4") = "English"
    ws.Range("C2:C4") = Application.Transpose(Array("Chinese", "Spanish", "Russian"))
    
    ' test
    ws.Range("D2") = Translate("hello", "en", "zh-cn", True)
    ws.Range("E2") = Translate("hello", "en", "zh-cn", False)
    ws.Range("D3") = Translate("hello", "en", "es", True)
    ws.Range("E3") = Translate("hello", "en", "es", False) 'Spanish uses latin alphabet
    ws.Range("D4") = Translate("hello", "en", "ru", True)
    ws.Range("E4") = Translate("hello", "en", "ru", False)

End Sub

Public Function Translate(strInput As String, strFromLanguageCode As String, strToLanguageCode As String, blnTargetAlphabet As Boolean) As String
    
    Dim strURL As String
    Dim objHTTP As Object
    Dim objHTML As Object
    Dim objDivs As Object, objDiv
    Dim strTranslatedT0 As String
    Dim strTranslatedO1 As String
   
    ' send query to web page
    strURL = "https://translate.google.com/m?hl=" & strFromLanguageCode & _
        "&sl=" & strFromLanguageCode & _
        "&tl=" & strToLanguageCode & _
        "&ie=UTF-8&prev=_m&q=" & strInput

    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    objHTTP.Open "GET", strURL, False
    objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    objHTTP.send ""
    
    ' create a html document
    Set objHTML = CreateObject("htmlfile")
    With objHTML
        .Open
        .Write objHTTP.responseText
        .Close
    End With
    
    ' o1 has Anglicised translation, t0 as tranlsation in target language
    Set objDivs = objHTML.getElementsByTagName("div")
    For Each objDiv In objDivs
        If objDiv.className = "o1" Then
            strTranslatedO1 = objDiv.innerText
        End If
        If objDiv.className = "t0" Then
            strTranslatedT0 = objDiv.innerText
        End If
    Next objDiv
    
    ' choose which to return
    If blnTargetAlphabet Then
        Translate = strTranslatedT0
    Else
        Translate = strTranslatedO1
    End If
    
CleanUp:
    Set objHTML = Nothing
    Set objHTTP = Nothing
    
End Function

December 2020 update

Looks like this method will no longer work going back maybe to mid November.

Looking at the response

the div class names have changed to something more obscure
there's some esoteric c-wiz elements doing something wonderful...
also, I suspect that some client side script is calling for the actual translation after the document is retrieved

Options: Selenium, Microsoft Translate, free and paid tiers for Google translation APIs ;)

This method is still working in 2020 and is faster than the alternate method of using `ie.Document.getElementByID(strResultElement).innerText` that has stopped working with Google because they no longer include an element ID in the translation frame. — Ben, Feb 19 '20 at 14:50
@Ben - no longer working unfortunately. I'll update the answer. — Robin Mackenzie, Dec 23 '20 at 10:45

Noam Brand · Answer 2 · 2021-01-26T22:23:32.980

@Robin Mackenzie solution works with these fixes:

Similar with Youtube Automate Language Translations Using Excel VBA by Dinesh Kumar Takyar

"t0" changed.

Replace If objDiv.className = "t0" Then with

 If objDiv.className = "result-container" Then

For encoding problems (for example in Hebrew language) add in the beginning (Office 2013 and above):

strInput = WorksheetFunction.EncodeURL(strInput)

Note: Do not look at the Google translate regular web page HTML, the "/m?" stands for "mobile" using the Google translate mobile page that has a different and simpler HTML code.

Extract content of div from Google Translate with VBA

2 Answers2

December 2020 update

Linked