0

I am trying to detect the language of a sentence in python. I tried 'langdetect' and 'nltk word corpus' but nothing is giving the expected results: My example df is:

df = pd.DataFrame({'text': ['Auxiliar Director/a de Hotel', 'Jefe de Tienda', 'Data Analyst']})

and expected result is:

    text                            detected_language
0   Auxiliar Director/a de Hotel    spanish
1   Jefe de Tienda                  spanish
2   Data Analyst                    english 

TIA!

PRIN
  • 344
  • 1
  • 7
  • [Determining what language a string contains in a pandas DataFrame](https://stackoverflow.com/q/59610076/15497888), [Python: How to determine the language?](https://stackoverflow.com/q/39142778/15497888) – Henry Ecker Sep 12 '21 at 20:09

1 Answers1

0

I think the problem is for 'langdetect' to work, it requires large portions of text. When I extend your word phrases, it detected the language correctly.

Extended phrases

  • 'La subdirectora del hotel es muy buena, me gusta.'
  • "El gerente de latienda es un papel clave en este complejo comercial."
  • 'Data Analyst is a great job'

Code used to predict

from langdetect import detect, DetectorFactory
#pip install iso-639
from iso639 import languages

DetectorFactory.seed = 0
out = detect('El gerente de la tienda es un papel clave en este complejo comercial.')
out_full = languages.get(alpha2=out).name
print(out_full)

Output

Spanish
  • Hi, thanks for the ans but my sentences are that long only. What if I only want to derect whether a sentence is in English or not. And return true and false. Any pointers on that? – PRIN Sep 12 '21 at 20:43