0

Possible Duplicate:
Return the language of a given string

In my program I have to find the language of an input string. For example, if an user enters

 "hello world" 

the language detected is English and if an user enters

"RUE"

it should be French.

Currently, in my project an user can input French or English.

I tried using CultureInfo class but I didn't get anything fruitful.

Community
  • 1
  • 1
Niraj Choubey
  • 3,942
  • 18
  • 58
  • 93
  • Your application needs to have some kind of a dictionary, so it can recognize the string. Otheriwise there is no way of recognition. – Mitja Bonca Sep 21 '11 at 08:11
  • 4
    "rue" is an English word too. Even if it doesn't form a complete English sentence, why should your program not detect it as English but as French? – BoltClock Sep 21 '11 at 08:13
  • 1
    You could use the Google Language API (deprecated alas) or something similar. Take a look at the following SO answer: http://stackoverflow.com/questions/1192768/return-the-language-of-a-given-string/1192802#1192802. There is no easy way to do this out-of-the-box. – Christophe Geers Sep 21 '11 at 08:15
  • 1
    http://stackoverflow.com/questions/1192768/return-the-language-of-a-given-string/1192802#1192802 –  Sep 21 '11 at 08:15
  • 1
    Good question, but it's already been answered here at stackoverflow :) Please do check this very helpful post [Here is the Answer](http://stackoverflow.com/questions/1192768/return-the-language-of-a-given-string/1192802#1192802) – woodykiddy Sep 21 '11 at 08:12

4 Answers4

2

I think you need to include the language dictionary and then match the words enter to predict the accurate language in which the input is provided.

TextCat is very good for language identification. And it has a lot of implementations in different languages.

Ivan Akcheurov has produced a version with no ports, which can be found HERE.

It is pure .Net Framework dll + command line interface to it. It is fully compatible with 74 language models from TextCat, so it is capable of detecting language out of the box.

Community
  • 1
  • 1
Pankaj Upadhyay
  • 12,966
  • 24
  • 73
  • 104
0

There's no build in functionality, and it's not a trivial task but take a look at this question and answer. If you have a large enough learning base then it can be used to determine the language a text is written in. It's always going to be a best guess, since some text such as medical english uses a lot of words that you'd find in French text (or at least where the words are more similar to Franch than English even if the text is written in English)

An very good example on how difficult it can be to determine the language, especially when the text is short is actually "rue". It's French for street but is also a city in at least 4 different countries, so there's five possible languages. One being French and One being English. (There's a town in Virginia calle Rue)

Community
  • 1
  • 1
Rune FS
  • 21,497
  • 7
  • 62
  • 96
0

There is no such built-in functionality in .NET, so you need to implement it by yourself (which is very expensive resource-wise) or to try to take advantage of public services such as google translate which might be (or might be not) useful for this task.

Dmitry
  • 3,069
  • 1
  • 17
  • 26
0

The Google Translate API supports detecting the language of a string. This is a paid service, but probably worth the money.

Malte Clasen
  • 5,637
  • 1
  • 23
  • 28