Using Python, I want to identify French text in a list of short strings (from 1 to about 50 words) which are otherwise in English.
An example of the input data (input strings here are separated by commas):
year of the snake, legendary 'dragon horse', thunder, damsel-fly, larvae of mosquito,
treillage, libellule, mythical water creature, petites chevrettes, de papillon hideux,
the horse-fly, 5th earthly branch, dragon, mythical creature,
a shore plant whose leaves dry a bright orange, dragon horse, god of rain, year of the dragon,
orthopteran, crocodile, dont le duvet des ailes s'en va en poussière, insecte, dragonfly,
dracontomelon vitiense, dragon king, petit filet pour une espèce de papillon, sorte d'insecte
Ideally I want to use a library that's already been built, as I'm aware that this is a difficult problem. However, the natural language library in Python I am most familiar with, nltk, does not seem to have the ability to do this, or if it does I haven't found it.
I'm aware that identifying a single word or two is likely to be very difficult, and I'd rather have false negatives (French misidentified as English) than false positives.