How to detect the language of a sentence in python

Question

I am trying to detect the language of a sentence in python. I tried 'langdetect' and 'nltk word corpus' but nothing is giving the expected results: My example df is:

df = pd.DataFrame({'text': ['Auxiliar Director/a de Hotel', 'Jefe de Tienda', 'Data Analyst']})

and expected result is:

    text                            detected_language
0   Auxiliar Director/a de Hotel    spanish
1   Jefe de Tienda                  spanish
2   Data Analyst                    english

TIA!

[Determining what language a string contains in a pandas DataFrame](https://stackoverflow.com/q/59610076/15497888), [Python: How to determine the language?](https://stackoverflow.com/q/39142778/15497888) — Henry Ecker, Sep 12 '21 at 20:09

score 0 · Answer 1 · answered Sep 12 '21 at 20:22

I think the problem is for 'langdetect' to work, it requires large portions of text. When I extend your word phrases, it detected the language correctly.

Extended phrases

'La subdirectora del hotel es muy buena, me gusta.'
"El gerente de latienda es un papel clave en este complejo comercial."
'Data Analyst is a great job'

Code used to predict

from langdetect import detect, DetectorFactory
#pip install iso-639
from iso639 import languages

DetectorFactory.seed = 0
out = detect('El gerente de la tienda es un papel clave en este complejo comercial.')
out_full = languages.get(alpha2=out).name
print(out_full)

Output

Spanish

Hi, thanks for the ans but my sentences are that long only. What if I only want to derect whether a sentence is in English or not. And return true and false. Any pointers on that? — PRIN, Sep 12 '21 at 20:43

How to detect the language of a sentence in python

1 Answers1