2

In case anyone is interested, this is a followup to Regular expression to match a Python integer literal.

The tokenize module is useful for breaking apart a Python expression, but tokenize.NUMBER is not very expressive, as it represents all kinds of number literals, for example, 1, 1l (in Python 2), 0xf2, 1e-10, 1.1, 0b101, 0o17, and 1j are all considered NUMBER (and also all the previous with uppercase letters). Is there a function in the standard library that tells me what kind of the above I have? I particularly care about if I have an integer or a float (complex is also considered float), but further expressiveness would be OK too :). Basically, I don't want to try to catch all possible number literals myself, as I already managed to do it wrong once.

Community
  • 1
  • 1
asmeurer
  • 86,894
  • 26
  • 169
  • 240
  • 1
    The last question you asked (the linked one) already describes how to get the type of number. Call `type(num)` to see if it's a float or int. – Josh Smeaton Aug 08 '12 at 22:34

3 Answers3

3

You can use ast.literal_eval to parse any Python number format down to an int, float, or long:

>>> ast.literal_eval('1')
1
>>> ast.literal_eval('1l')
1L
>>> ast.literal_eval('0x2')
2
>>> ast.literal_eval('0b1101')
13

Bear in mind that there is no 'hex' or 'oct' or 'bin' type in Python. Those literal strings are immediately converted to their decimal equivalents.

This works pretty well:

def numtype(s):
    numtypes=[int,long,float,complex]

    try:
        n=ast.literal_eval(s)
    except SyntaxError:
        return None

    if type(n) not in numtypes:
        return None  
    else:
        return type(n)    

for t in ['1','0x1','0xf2','1e-10','0o7','1j', '0b1101']:
    print t, numtype(t)              

Prints:

1 <type 'int'>
0x1 <type 'int'>
0xf2 <type 'int'>
1e-10 <type 'float'>
0o7 <type 'int'>
1j <type 'complex'>
0b1101 <type 'int'>

If you really need to differentiate between the different decimal types, you could do something like:

def numtype(s):
    numtypes=[int,long,float,complex]

    try:
        n=ast.literal_eval(s)
    except SyntaxError:
        return None

    if type(n) not in numtypes:
        return None    

    if type(n) != int:
        return type(n)
    else:
        if 'x' in s.lower():
            return 'HEX'
        if 'o' in s.lower():
            return 'OCT'   
        if 'b' in s.lower():
            return 'BIN'     

        return int
dawg
  • 98,345
  • 23
  • 131
  • 206
2

Possibly ast.literal_eval?

type(ast.literal_eval(s))
ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • 1
    same answer as mine and `type` will return `` for hex, oct, and bin however – dawg Aug 08 '12 at 22:35
  • Should work for what I need, though. I guess I need to check `type(ast.literal_eval(s)) not in (int, long)`. – asmeurer Aug 08 '12 at 22:51
  • Or isinstance I guess would be better. – asmeurer Aug 08 '12 at 22:51
  • Unfortunately for me, I need it to work in Python 2.5, which doesn't include ast. But I think it should be safe to use regular `eval()` if I know the input is a `tokenize` NUMBER. – asmeurer Aug 08 '12 at 22:52
0
def is_int(number_string):
    try:
        i = int(number_string)
    except ValueError:
        return False
    return True
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • But has to be duplicate/repeated for `int`, `long`, `float` .. any more unified method? Also, does this work for octal/hex notations (or is there an assumed base)? –  Aug 08 '12 at 22:42
  • @asmeurer, sorry - I wasn't aware that `int()` didn't do the full set of integer literals. I wonder if there's an equivalent which does? Looking at the other answers I guess that's what `ast.literal_eval` does. – Mark Ransom Aug 08 '12 at 23:06
  • It does, but you have to pass it the base in the second argument. – asmeurer Aug 08 '12 at 23:16
  • @asmeurer that kind of defeats the purpose of having the base encoded in the literal. – Mark Ransom Aug 09 '12 at 01:46
  • Yeah. I guess it's so you can do it for other bases, and also where it isn't in the literal (like `int('10', 2)` or `int('21', 3)`). I totally agree, but I didn't write it. I guess the logic is that it's `int(n, base=10)`, so that the base if not given defaults to 10. – asmeurer Aug 09 '12 at 06:44