Questions tagged [language-detection]

Language detection or language identification is the task of identifying the language(s) in a fragment of text.

From Wikipedia:

In natural language processing, language identification or language guessing is the problem of determining which natural language given content is in. Computational approaches to this problem view it as a special case of text categorization, solved with various statistical methods.

...

One of the great bottlenecks of language identification systems is to distinguish between closely related languages. Similar languages like Serbian and Croatian or Indonesian and Malay present significant lexical and structural overlap, making it challenging for systems to discriminate between them.

http://corporavm.uni-koeln.de/vardial/sharedtask.html has input data and results from a recent competition (COLING 2014 VarDial workshop DSL task).

142 questions
133
votes
17 answers

Detecting programming language from a snippet

What would be the best way to detect what programming language is used in a snippet of code?
João Matos
  • 1,515
  • 2
  • 11
  • 11
116
votes
8 answers

What differences, if any, between C++03 and C++11 can be detected at run-time?

It is possible to write a function, which, when compiled with a C compiler will return 0, and when compiled with a C++ compiler, will return 1 (the trivial sulution with #ifdef __cplusplus is not interesting). For example: int isCPP() { return…
Armen Tsirunyan
  • 130,161
  • 59
  • 324
  • 434
46
votes
7 answers

How to detect language of user entered text?

I am dealing with an application that is accepting user input in different languages (currently 3 languages fixed). The requirement is that users can enter text and dont bother to select the language via a provided checkbox in the UI. Is there an…
ManBugra
  • 1,289
  • 2
  • 14
  • 20
35
votes
19 answers

Detect language from string in PHP

In PHP, is there a way to detect the language of a string? Suppose the string is in UTF-8 format.
Beier
  • 3,127
  • 10
  • 28
  • 24
27
votes
1 answer

Browser language detection

I need in my Angular2 app detect browser language. Based on this language I need to send request (to a REST API of backend) with localization and IDs of my variables, which I need to translate. After that I received response with translated…
Loutocký
  • 822
  • 2
  • 15
  • 28
22
votes
7 answers

Detect language of text

Is there any C# library which can detect the language of a particular piece of text? i.e. for an input text "This is a sentence", it should detect the language as "English". Or for "Esto es una sentencia" it should detect the language as…
Nikhil
  • 2,028
  • 7
  • 24
  • 33
22
votes
9 answers

How to detect the language of a string?

What's the best way to detect the language of a string?
Alon Gubkin
  • 56,458
  • 54
  • 195
  • 288
19
votes
10 answers

PHP: How do I detect if an input string is Arabic

Is there a way to detect the language of the data being entered via the input field?
HyderA
  • 20,651
  • 42
  • 112
  • 180
15
votes
5 answers

Error using langdetect in python: "No features in text"

Hey I have a csv with multilingual text. All I want is a column appended with a the language detected. So I coded as below, from langdetect import detect import csv with open('C:\\Users\\dell\\Downloads\\stdlang.csv') as csvinput: with…
user7140275
  • 215
  • 1
  • 3
  • 9
14
votes
5 answers

How to detect language of text?

I have a form which lets users input text snippets. So how can figure out the language of the entered text? Specifically these languages for now: Arabic: هذه هي بعض النصوص العربية Chinese: 这是一些阿拉伯文字 Japanese: これは、いくつかのアラビア語のテキストです [Edit] The…
Yeti
  • 5,628
  • 9
  • 45
  • 71
12
votes
8 answers

How to detect the language of a given text

In my Rails 3 application, users may write messages in forum. I would like to identify what the language is for a given message. I'm interested in English, Russian, and Hebrew languages. Is there any built-in library in Ruby/Rails for such a task?…
Misha Moroshko
  • 166,356
  • 226
  • 505
  • 746
12
votes
2 answers

Textblob - HTTPError: HTTP Error 429: Too Many Requests

I am having a dataframe of which one column has a list of strings at each row. On average, each list has 150 words of about 6 characters each. Each of the 700 rows of the dataframe is about a document and each string is a word of this document; so…
Outcast
  • 4,967
  • 5
  • 44
  • 99
12
votes
7 answers

How to detect language

Are there any good, open source engines out there for detecting what language a text is in, perhaps with a probability metric? One that I can run locally and doesn't query Google or Bing? I'd like to detect language for each page in about 15 million…
niklassaers
  • 8,480
  • 20
  • 99
  • 146
9
votes
6 answers

Language detection with data in PostgreSQL

I have a table in PostgreSQL where a column is a text. I need a library or tool that can identify the language of each text for a test purpose. There is no need for a PostgreSQL code because I'm having problems to install languages, but any language…
Renato Dinhani
  • 35,057
  • 55
  • 139
  • 199
9
votes
2 answers

Python langdetect: choose between one language or the other only

I'm using langdetect to determine the language of a set of strings which I know are either in English or French. Sometimes, langdetect tells me the language is Romanian for a string I know is in French. How can I make langdetect choose between…
vandernath
  • 3,665
  • 3
  • 15
  • 24
1
2 3
9 10