0

I am creating a small dictionary, with additional option to use google translate. So here is the problem: when I receive the respond from Google and show it in a textbox I see some kind of strange symbols. Here is the code of the method which "asks" google:

public string TranslateText(string inputText, string languagePair)
    {
        string url = String.Format("http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair={1}", inputText, languagePair);

        WebClient webClient = new WebClient();
        webClient.Encoding = System.Text.Encoding.UTF8;

        // Get translated text
        string result = webClient.DownloadString(url);

        result = result.Substring(result.IndexOf("<span title=\"") + "<span title=\"".Length);
        result = result.Substring(result.IndexOf(">") + 1);
        result = result.Substring(0, result.IndexOf("</span>"));

        return result.Trim();
    }

..and calling this method like this(after translate button clicked):

string resultText;
string inputText = tbInputWord.Text.ToString();

if (inputText != null && inputText.Trim() != "")
{
     ExtendedGoogleTranslate urlTranslate = new ExtendedGoogleTranslate();

     resultText = urlTranslate.TranslateText(inputText, "en|bg");

     tbOutputWord.Text = resultText;
 }

So I am translating from English(en) to Bulgarian(bg) and encoding webClient with UTF8 so I think that I am missing something on caller code to parse resultText somehow before putting it to tbOutputWord textbox. I know that this code works, because if I choose to translate from English to French(for example) it shows the correct result.

M.Veli
  • 519
  • 1
  • 6
  • 15
  • Can you provide a sample request for `http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair={1}` which yields those "strange symbols"? – Jan Köhler Dec 28 '14 at 21:32
  • if **inputText** equals **"hello"** the result should be **"Здравейте"** instead I am seeing **"?????????"** but if I change the **"en|bg"** to **"en|fr"** with the same input the result is **"bonjour"** – M.Veli Dec 28 '14 at 21:40

1 Answers1

2

Somehow, Google doesn't respect the ie=UTF8 query parameter. We need to add some headers to our request so that UTF8 is returned:

WebClient webClient = new WebClient();
webClient.Encoding = System.Text.Encoding.UTF8;
webClient.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0");
webClient.Headers.Add(HttpRequestHeader.AcceptCharset, "UTF-8");
Frank
  • 4,461
  • 13
  • 28
  • Strange, but it works! Now another bug popped out. When translating something like this: "Hey, are you okay?" everything works fine, but if you write the input like this: "Hey! Are you okay?" it only translates "Hey!". Also as you see in my code, I directly read(for now) from inputTextBox, and there is no formatting, but looks like it translates only one sentence at time. – M.Veli Dec 28 '14 at 22:24
  • 1
    You should 1.) construct your query string on a different way. Consider it doing it like this: http://stackoverflow.com/a/1877016/4317569 Your current problem, though, is another one: You need 2.) a regular expression for parsing the result. Set a breakpoint after `DownloadString`, and you will see that your translation works. Your parsing is the problem. – Frank Dec 28 '14 at 23:00
  • Fixed it, with a loop, because I saw that the respond contains **** tag for each sentence. – M.Veli Dec 29 '14 at 09:14
  • Wish I had more + votes to give this. Success where so many other 'solutions' fail. – nathanchere Jun 13 '16 at 12:21