0

Yesterday I was doing some testing to identify the type of an element from a list.

types={"float":float, "int":int, "str":str}   
try:    
    sql_type = next (k for k,v in types.iteritems() if isinstance (uniqLst[0],v))    
except TypeError as Typeerr:    
    print "Type not right: " + str(Typeerr)    

Well, of course the element turns out to always be a string as the data the list holds derives from a text file. I was wondering what might be a good way to check the true nature of the element. Should you really go for a try/except solution, like:

def check_type(element):
    try:
        int(element)
        return 'int'
    except:
       pass
    try:
       float(element)
       return 'float'
    except:
        pass
    try:
        str(element)
        return 'str'
    except:
        return 'error type'

What about re.compile? (And then something like ('[0-9]+')) Doesn't seem very practical to me. Any advice is very much appreciated!

Cheers, LarsVegas

LarsVegas
  • 6,522
  • 10
  • 43
  • 67

3 Answers3

3

Based on your comment to ikanobori's answer, you might be looking for the following string methods:

str.isalnum()

Return true if all characters in the string are alphanumeric and there is at least one character, false otherwise. A character c is alphanumeric if one of the following returns True: c.isalpha(), c.isdecimal(), c.isdigit(), or c.isnumeric().

str.isalpha()

Return true if all characters in the string are alphabetic and there is at least one character, false otherwise. Alphabetic characters are those characters defined in the Unicode character database as “Letter”, i.e., those with general category property being one of “Lm”, “Lt”, “Lu”, “Ll”, or “Lo”. Note that this is different from the “Alphabetic” property defined in the Unicode Standard.

str.isdecimal()

Return true if all characters in the string are decimal characters and there is at least one character, false otherwise. Decimal characters are those from general category “Nd”. This category includes digit characters, and all characters that that can be used to form decimal-radix numbers, e.g. U+0660, ARABIC-INDIC DIGIT ZERO.

str.isdigit()

Return true if all characters in the string are digits and there is at least one character, false otherwise. Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.

str.isidentifier()

Return true if the string is a valid identifier according to the language definition, section Identifiers and keywords.

str.islower()

Return true if all cased characters in the string are lowercase and there is at least one cased character, false otherwise. Cased characters are those with general category property being one of “Lu”, “Ll”, or “Lt” and lowercase characters are those with general category property “Ll”.

str.isnumeric()

Return true if all characters in the string are numeric characters, and there is at least one character, false otherwise. Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH. Formally, numeric characters are those with the property value Numeric_Type=Digit, Numeric_Type=Decimal or Numeric_Type=Numeric.

str.isprintable()

Return true if all characters in the string are printable or the string is empty, false otherwise. Nonprintable characters are those characters defined in the Unicode character database as “Other” or “Separator”, excepting the ASCII space (0x20) which is considered printable. (Note that printable characters in this context are those which should not be escaped when repr() is invoked on a string. It has no bearing on the handling of strings written to sys.stdout or sys.stderr.)

str.isspace()

Return true if there are only whitespace characters in the string and there is at least one character, false otherwise. Whitespace characters are those characters defined in the Unicode character database as “Other” or “Separator” and those with bidirectional property being one of “WS”, “B”, or “S”.

str.istitle()

Return true if the string is a titlecased string and there is at least one character, for example uppercase characters may only follow uncased characters and lowercase characters only cased ones. Return false otherwise.

str.isupper()

Return true if all cased characters in the string are uppercase and there is at least one cased character, false otherwise. Cased characters are those with general category property being one of “Lu”, “Ll”, or “Lt” and uppercase characters are those with general category property “Lu”.

Weetu
  • 1,761
  • 12
  • 15
2

It is strongly discouraged to use type-checking in Python. Why do you need to know the types?

Python employs duck-typing, meaning I can create my own subclass of the Integer object which behaves the exact same way only it is not an instance of 'type int' which would defeat your type checking adventure.

The idea is to use an object in the way you want to use it and if that fails raise an exception, that way you give both yourself and eventual other coders who work with your code that much more freedom :-)

supakeen
  • 2,876
  • 19
  • 19
  • Well, I read the data from a txt file and want to execute a sql statement based on it. The thing is: in the txt file could be anything - integers, floats, strings. That's why I need to check their type before putting together the statement. – LarsVegas Feb 28 '12 at 10:53
  • 1
    No, in the text file everything is a string so you would already be converting from string to your types meaning you already *have* the information on what types they are, why the need to check again? Also if you are using a Python DB-API compliant database module (most of them are) your bound parameters would solve the need for manual typechecking and handle the coercion for you? – supakeen Feb 28 '12 at 10:57
  • I know that everything in the text file is a string, that is what I said in the introduction to my question. But they could 'actually' be something else, so yes, there would be need for a conversion. Technically I can solve the problem, I was just curious what would be a good way to go about. I get the point that it's not recommended to do type checking. But the string methods mentioned by @Weetu show that there apparently is the need to find out more about the nature of your string(s). However, thanks for your reply and your insight on this topic. Cheers, LarsVegas – LarsVegas Feb 28 '12 at 12:01
  • No problem I probably just focused too much on the type checking part of your question :-) – supakeen Feb 28 '12 at 12:09
1

@Weetu's answer is a great overview of how Python string predicates correspond to Unicode general category properties.

As an exercise I tried to write a program to figure out myself, which string predicate (e.g. isdigit) correspond to which character properties. So I opened Unicode Character Categories and filled a dictionary with keys being 2-letter categories ('Lt') and value being example Unicode characters (u'Dž'), then wrote a program to get correspondence between string predicates and character properties. I used Python 3, because Python 2 had some mysterious Unicode bug (I'll be glad, if you point out, why code below works incorrectly in Python 2).

d = {
    'Cc': u'',   'LC': None,  'Pc': u'_',  'Sc': u'$', 
    'Cf': u'', 'Ll': u'a',  'Pd': u'-',  'Sk': u'^',
    'Cn': None,  'Lm': u'ʰ',  'Pe': u')',  'Sm': u'+',
    'Co': u'',  'Lo': u'ª',  'Pf': u'»',  'So': u'¦',
    'Cs': u'⠀',  'Lt': u'Dž', 'Pi': u'«',
                 'Lu': u'A',  'Po': u'!',
                              'Ps': u'(',
    'Mc': u'ः', 'Nd': u'0',  'Zl': u'
',
    'Me': u'҈',  'Nl': u'ᛮ',  'Zp': u'
',
    'Mn': u'̀',   'No': u'²',  'Zs': u' '                                    
} # Zl and Zp have invisible characters that break Markdown's code blocks
methods = ['isalnum', 'isalpha', 'isdigit', 'islower', 'isspace',
           'istitle', 'isupper', 'isnumeric', 'isdecimal']

dl = {method: [code
               for code, character
               in d.items()
               if character and getattr(character, method)()]
      for method in methods}

Result is below. E.g. ch.isdigit() will return True if ch has either No or Nd Unicode property.

>>> from pprint import pprint # pretty printing
>>> pprint(dl)
{'isalnum': ['No', 'Nd', 'Nl', 'Lu', 'Lt', 'Lo', 'Lm', 'Ll'],
 'isalpha': ['Lu', 'Lt', 'Lo', 'Lm', 'Ll'],
 'isdecimal': ['Nd'],
 'isdigit': ['No', 'Nd'],
 'islower': ['Lo', 'Lm', 'Ll'],
 'isnumeric': ['No', 'Nd', 'Nl'],
 'isspace': ['Zp', 'Zs', 'Zl'],
 'istitle': ['Lu', 'Lt'],
 'isupper': ['Lu']}

For more intelligent operations with Unicode Character Database, see Python library unicodedata.

References:

Community
  • 1
  • 1
Mirzhan Irkegulov
  • 17,660
  • 12
  • 105
  • 166