0

I looked around and I do not see a clear answer for Python to convert an alphanumeric string to a numeric one. Here is an example of numbers I would like to convert.

"1234alpha" --> 1234
"a1234asdf" --> 0
"1234.56yt" --> 1234.56

Any advice would be appreciated.

DK

Mike Müller
  • 82,630
  • 20
  • 166
  • 161
user2460897
  • 31
  • 1
  • 2
  • 1
    Why is `"a1234asdf" --> 0` ? – Andy Hayden Jun 06 '13 at 20:03
  • Are you looking to convert the initial numeric prefix, and ignore everything that comes after, like C's `atoi` function? – abarnert Jun 06 '13 at 20:04
  • Because numbers imbedded in strings are clearly meant to be a part of the string. Whereas leading numbers may be something like 1234*t and I just want the leading p[art – user2460897 Jun 06 '13 at 20:05
  • Yes, @Abamert something like VFP... – user2460897 Jun 06 '13 at 20:05
  • 4
    That's kind of an odd assumption… but if that's your intention, this is a dup of [Python equivalent to atoi / atof](http://stackoverflow.com/questions/1665511/python-equivalent-to-atoi-atof). – abarnert Jun 06 '13 at 20:05
  • What's VFP? You don't mean ARM-style Virtual Floating Point or Vector Ploating Point, and I can't think of anything else programming-related off the top of my head… – abarnert Jun 06 '13 at 20:06
  • No, Visual FoxPro, the ultimate language for humans, @Abanert! – user2460897 Jun 06 '13 at 20:09
  • 1
    @user2460897: I wasn't aware that the ultimate language for humans had words like `ENDIF` and `ENDFOR`, required you to shout all syntactic words, or pronounced "true" as `dot T dot`… – abarnert Jun 06 '13 at 20:13
  • You may feel resentment for these little things, but if you want to be productive, VFP represents 5 high level languages in one! And they are all tightly integrated. C# is MS's attempt to send VFP into the Dark Ages and it failed miserably. VFP rocks! – user2460897 Jun 06 '13 at 20:21
  • Is it OK if `1234` is returned as `1234.0`? Or if `1234.0` is returned as `1234`? – Thijs van Dien Jun 06 '13 at 20:54

7 Answers7

4

For a change itertools and no regex:

>>> import itertools as it
>>> number = ''.join(it.takewhile(str.isdigit, '123dfd'))
>>> int(number) if number else 0
123
>>> number = ''.join(it.takewhile(str.isdigit, 'a123dfd'))
int(number) if number else 0
0

Somewhat uglier it works for floats:

>>> number = ''.join(it.takewhile(lambda x: x.isdigit() or 
                                   x == '.', '123.45dfd'))
>>> float(number) if number else 0
123.45

Floats, negatives:

def make_number(alphanum):
    sign = 1
    if alphanum and alphanum[0] in '+-':
        sign = int(alphanum[0] + '1')
        alphanum = alphanum[1:]
    try:    
        return float(''.join(it.takewhile(lambda x: x.isdigit() 
                                           or x == '.', alphanum))) * sign
    except ValueError:
        return 0

Conclusion: Changing the requirements along the way can turn a simple solution into a complicated one.

Mike Müller
  • 82,630
  • 20
  • 166
  • 161
1

To support positive/negative integer/float numbers, you could use a slightly modified regexp from Extract float/double value:

import re

re_float = re.compile("""(?x)
   ^
      [+-]?\ *      # first, match an optional sign *and space*
      (             # then match integers or f.p. mantissas:
          \d+       # start out with a ...
          (
              \.\d* # mantissa of the form a.b or a.
          )?        # ? takes care of integers of the form a
         |\.\d+     # mantissa of the form .b
      )
      ([eE][+-]?\d+)?  # finally, optionally match an exponent
   """)

def extract_number(s, default=None):
    m = re_float.match(s)
    if not m:
        return default # no number found
    f = float(m.group(0)) #XXX to support huge numbers, try/except int() first
    return int(f) if f.is_integer() else f

Example

for s in sys.stdin:
    print(extract_number(s, default=0))

Input

1234alpha
a1234asdf
1234.56yt
-1e20.

Output

1234
0
1234.56
-100000000000000000000
Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
0

You can use the re module:

import re

def alp(s):
    m = re.match('\d+', s)
    return int(m.group(0)) if m is not None and m.start() == 0 else 0

In [3]: alp('a1234asdf')
Out[3]: 0

In [4]: alp('1234alpha')
Out[4]: 1234

If you want to include negative integers:

def alp_neg(s):
    m = re.match('[+-]?\d+', s)
    return int(m.group(0)) if m is not None and m.start() == 0 else 0

If you want floats too:

def alp_floats(s):
    m = re.match('[+-]?\d+(\.\d+)?', s)
    return float(m.group(0)) if m is not None and m.start() == 0 else 0

In [7]: alp_floats('-12.2ss31.232sadas')
Out[7]: -12.2
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
0
import re
def str_to_int(string):
    match = re.match("\d+", string)
    if match:
        try:            
        return int(match.group())
    except ValueError:
        return float(match.group())
    else:
        return 0

str_to_int("1234alpha") 
1234
str_to_int("a1234asdf") 
0
Ayaz Ahmad
  • 71
  • 4
0
import ast
from itertools import takewhile

ast.literal_eval(''.join(takewhile(lambda x: x<='9', string)) or '0')
dansalmo
  • 11,506
  • 5
  • 58
  • 53
0

When the rules for what is OK become hard to define, you might consider this binary search approach that tries to find the bound.

def binsearch_prefix(seq, predicate):
    best_upper = 0
    lower, upper = 0, len(seq)
    while lower < upper:
        mid = (lower + upper) / 2
        if predicate(seq[:mid]):
            best_upper = mid
            lower = mid + 1
        else:
            upper = mid
    return seq[:best_upper]

It will return the part of the string that you consider acceptable. For example, this could be your accept function:

def can_float(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

Example:

print binsearch_prefix(can_float, "1234alpha") # "1234"
print binsearch_prefix(can_float, "a1234asdf") # ""
print binsearch_prefix(can_float, "1234.56yt") # "1234.56"

You may then format the prefix any way you like.

Thijs van Dien
  • 6,516
  • 1
  • 29
  • 48
-1

Maybe use regular expressions?

import re

def str2num(s):
    try:
        num = re.match(r'^([0-9]+)', s).group(1)
    except AttributeError:
        num = 0
    return int(num)

print str2num('1234alpha')
print str2num('a1234asdf')

Output:

1234
0
bwind
  • 705
  • 5
  • 7
  • This raises an exception for the OP's test case `'a1234asdf'` instead of returning 0. It also does the wrong thing in all kinds of other cases. – abarnert Jun 06 '13 at 20:08
  • @dblslash, you are closer because yours returns a numeric value, but it fails with the second example! AttributeError: 'NoneType' object has no attribute 'group' – user2460897 Jun 06 '13 at 20:18
  • @dblslash, you are correct, that works fine now, though a little lengthy, but worthy. Any chance we can include numbers beyond the decimal point? – user2460897 Jun 06 '13 at 20:32
  • Yeah, if you use `r'^([0-9\.]+)'` as your regular expression, and then cast to a `float` instead of in `int`. – bwind Jun 06 '13 at 21:14