0

I am getting character codes (' and &ampquote;) that are breaking my responses (showing 39; and uto;) when returning a string from an HttpWebRequest:

internal static void TranslateThis(Player player, string fromLang, string toLang, string text){
    try
    {
        string translated = null;
        HttpWebRequest hwr = (HttpWebRequest)HttpWebRequest.Create("http://translate.google.com/?langpair=" + fromLang + "|" + toLang + "&text=" + text.Replace(" ", "+") + "#");
        HttpWebResponse res = (HttpWebResponse)hwr.GetResponse();
        StreamReader sr = new StreamReader(res.GetResponseStream());
        string html = sr.ReadToEnd();
        int a = html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47;
        int b = html.IndexOf("</span>",html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47);
        translated = html.Substring(a, b - a);
        if (translated.Length < (10 * text.Length)){
            if (player == Player.Console)
            {
                player.ParseMessage(translated, true);
            }
            else
            {
                player.ParseMessage(translated, false);
            }
        } else {
            player.Message("Usage: /translate [lang] [message]");
        }
    }
    catch
    {
        player.Message("Usage: /translate [lang] [message]");
    }
}
SystemX17
  • 3,637
  • 4
  • 25
  • 36
  • Your sample is not showing actual problem - you need to figure out first if your "html" variable contains value you expect and than test your ParseMessage methods with that value. – Alexei Levenkov Feb 21 '11 at 19:05
  • If the html variable contains a ' or a " it seems to cause problems for me. – SystemX17 Feb 21 '11 at 19:12
  • I have tested with StreamReader readStream = new StreamReader(receiveStream, Encoding.UTF7); and some of the French characters come out ok now. Sorry for not supplying all the variable that would be required to assist me. I am using Google Translate and returning the translated string. So it may include ' in some French words (e.g. J'ai). When I go to display the message to the player it shows up as J35;ai - does that help demonstrate my problem? – SystemX17 Feb 21 '11 at 19:15

3 Answers3

0

First of all make sure you get the correct encoding of the downloaded content. See this SO answer for code on how to do this.

Basically check both the http headers and the meta tags for the encoding, and re-encode the content if necessary. Then do a HttpUtility.HtmlDecode to get rid of any html coded characters. Now you are ready to start searching for whatever content you are trying to find.

I would also recommend using something like Html Agility Pack to parse the html instead of indexof.

Community
  • 1
  • 1
Mikael Svenson
  • 39,181
  • 7
  • 73
  • 79
  • As above: Using 3.5 Framework so unfortunately looks like that method isn't possible for me to use. My apologies to you as well for not stating that earlier and thank you also for your time. – SystemX17 Feb 21 '11 at 21:14
0

It is hard to say what exactly does your ParseMessage method expect, so this is just a guess:

The result you are getting from Google Translate is in HTML. Which means if you want a plain text output, you have to convert the HTML to text. You have successfully (for now, at least, until Google Translate changes their output page a tiny bit; your solution is not exactly fool- or future-proof) extracted the translation from the HTML page. But the translation is still encoded in HTML and you need to decode it. For that, you can use the WebUtility.HtmlDecode method (assuming you are using .NET Framework 4): After the

translated = html.Substring(a, b - a);

line, add

translated = WebUtility.HtmlDecode(translated);
Mormegil
  • 7,955
  • 4
  • 42
  • 77
  • Using 3.5 Framework so unfortunately looks like that method isn't possible for me to use. My apologies for not stating that earlier and thank you for your time. – SystemX17 Feb 21 '11 at 21:13
0

Discussions with another developer go me to try this before the last lot of comments. Here is what ended up working:

    internal static void TranslateThis(Player player, string fromLang, string toLang, string text){
        try
        {
            string translated = null;
            text = Regex.Replace(text, @"[^\w\.\'\s@-]", "");
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://translate.google.com/?langpair=" + fromLang + "|" + toLang + "&text=" + text.Replace(" ", "+") + "#");

            request.MaximumAutomaticRedirections = 4;
            request.MaximumResponseHeadersLength = 4;

            request.Credentials = CredentialCache.DefaultCredentials;
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();

            Stream receiveStream = response.GetResponseStream();

            StreamReader readStream = new StreamReader(receiveStream, Encoding.UTF7);
            String html = readStream.ReadToEnd() + "";
            int a = html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47;
            int b = html.IndexOf("</span>",html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47);
            translated = html.Substring(a, b - a);
            response.Close();
            readStream.Close();
            if (translated.Length < (10 * text.Length))
            {
                translated = translated.Replace("&#39", "'");
                translated = Regex.Replace(translated, @"[^\w\.\'\s@-]", "");
                if (player == Player.Console)
                {
                    player.ParseMessage(translated, true);
                }
                else
                {
                    player.ParseMessage(translated, false);
                }
            }
            else
            {
                player.Message("Usage: /translate [lang] [message]");
            }
        }
        catch(Exception ex)
        {
            player.Message("Error:" + ex.ToString());

        }
   }
SystemX17
  • 3,637
  • 4
  • 25
  • 36