Say I open a website in Chrome and it's in Russian. Chrome tells me it's in Russian and offers to translate it for me. How can I find out the language of a web page using C#? It's love to find out the actual language such as English, Spanish, Russian etc.
Asked
Active
Viewed 704 times
3
-
1Perhaps this can point you in an appropriate direction: http://stackoverflow.com/questions/1464362/detect-language-of-text – Bart Jul 17 '11 at 08:43
2 Answers
4
You could try parsing the <meta http-equiv="language" content="ru" />
and <meta http-equiv="content-language" content="ru" />
tags in the head of a page.
Usually these tags are not available on every page.
I think if these tags are missing Google does kind of "word lookup" in an internal database to try to determine the most probable language of the page.
Edit
You could also use the SOAP API of Bing to detect the language.
An example from their site:
var client = new TranslatorService.LanguageServiceClient();
var result = client.Detect(
"myAppId",
"I have no idea what this language may be");
Console.WriteLine("The detected language friendly code is: " + result);
Just extract some text (e.g. with HTML Agility Pack) from the HTML page you want to detect from and pass it to the SOAP function.

Uwe Keim
- 39,551
- 56
- 175
- 291
-
1`language` was never official and in any case, both should be done with the `lang` attribute on HTML or other elements. – Joey Jul 17 '11 at 09:06
-
@Joey, Language is not official, but Content-Language is, and is in RFC 2616. However, since the lang and xml:lang attributes can identify language changes within a document, I would agree they should be favoured. Of course, they depend on the author having put them there - google and bing have the advantage of having a massive source with which they can do comparisons to deal with such information being missing or even incorrect. – Jon Hanna Jul 17 '11 at 16:31
-
-
1Only through http-equiv, which was always a kludge at the best of times, the actual HTTP header is not deprecated, since that's out of scope for HTML 5. For people parsing though, they would still have to look for it, since deprecated means it CAN be used. – Jon Hanna Jul 17 '11 at 16:57
0
Use Google's api , send some (or all?) text from the page to the API to detect language.
For .NET library, see answer to this question.

Community
- 1
- 1

Sarwar Erfan
- 18,034
- 5
- 46
- 57
-
I read that Google will discontinue the API at the end of the year 2011. – Uwe Keim Jul 17 '11 at 09:13
-
1@Uwe Keim: Yes, that's because people like you and me extensively used the API. As Google said, they are shutting it down because of "Due to the substantial economic burden caused by extensive abuse". Anyways, for people still looking for "free" translations, they suggest using this: http://www.google.com/webelements/#!/translate which offcourse does not have any direct way to detect language. It is not an API – Sarwar Erfan Jul 17 '11 at 11:05