3

i have two strings

eng = "Clash of Clans – Android Apps on Google Play"
rus = "Castle Clash: Новая Эра - Android Apps on Google Play"

and now i want to check whether string is in English or not by using Python 3.

I have read this Stackoverflow answer here and it does not help me as its for Python 2.x solution but in comments some one mention that use

string.encode('ascii')

to make it work in Python 3.x but my problem is, in both cases it raises same UnicodeEncodeError exception!

Screenshot: enter image description here

so now i am stuck here and cant figure out how to make it work! kindly guide me or i have to use another method to determine if String is in English or not! Thanks

Community
  • 1
  • 1
maq
  • 1,175
  • 3
  • 17
  • 34

3 Answers3

5

As with Salvador Dali's answer you linked to, you must use a try-catch block to check for an error in encoding.

# -*- coding: utf-8 -*-
def isEnglish(s):
    try:
        s.encode('ascii')
    except UnicodeEncodeError:
        return False
    else:
        return True

Just to note though, when I copy and pasted your eng and rus strings to try them, they both came up as False. Retyping the English one returned True, so I'm not sure what's up with that.

Community
  • 1
  • 1
Hayley Guillou
  • 3,953
  • 4
  • 24
  • 34
  • what do u mean by retyping?? – maq Oct 08 '15 at 08:13
  • 1
    @maq it means typing the string in rather than using copy/paste. An English keyboard only has ASCII symbols on it so you won't accidentally get the EN DASH that your string contains. – Mark Ransom Oct 08 '15 at 11:41
3

Your English string really isn't true ASCII, it contains the character U+2013 - EN DASH. This looks very similar to the ASCII dash U+002d but it is different.

If this is the only character you need to worry about, you can do a simple replacement to make it work:

>>> eng.replace('\u2013', '-').encode('ascii')
b'Clash of Clans - Android Apps on Google Play'
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
3

You can use the isascii() method:

>>> rus.isascii()
False
Nishant
  • 31
  • 1