3

I am trying to write a function that splits a string containing a floating-point number and some units. The string may or may not have spaces between the number and the units.

In C, the function strtod has a very handy parameter, named endptr that allows you to parse-out the initial part of a string, and get a pointer to the remainder. Since this is exactly what I need for this scenario, I was wondering if there is a similar functionality buried somewhere in Python.

Since float itself does not currently offer this functionality, I am using a regex solution based on https://stackoverflow.com/a/4703508/2988730:

float_pattern = re.compile(r'[+-]?(?:(?:\d+\.?)|(?:\d*.\d+))(?:[Ee][+-]?\d+)')
def split_units(string):
    match = float_pattern.match(string)
    if match is None: raise ValueError('not a float')
    num = float(match.group())
    units = string[match.end():].strip()
    return num, units

This is not completely adequate for two reasons. The first is that it reinvents the wheel. The second is that it is not properly locale-aware without adding additional complexity (which is why I don't want to reinvent the wheel in the first place).

For the record, the tail of the string can not contain any characters that a number would contain. The only real issue is that I am not requiring units to be separated from numbers by a space, so doing a simple string.split(maxsplit=1) won't work.

Is there a better way to get a floating point number out of the beginning of the string, so I can process the rest as something else?

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • `locale.atof`? I've never used it and don't know how comprehensive ti is. – tdelaney May 21 '18 at 19:26
  • 1
    `float` itself isn't locale-aware. If you want locale-awareness, you want something like [`locale.atof`](https://docs.python.org/3/library/locale.html), which is going to reject some things that `float` accepts. – user2357112 May 21 '18 at 19:26
  • Yup. It's a dupe. Too bad there aren't any decent answers there... – Mad Physicist May 21 '18 at 19:34
  • I closed the question, it's an exact duplicate. I'm afraid you're stuck with regexes. The dupe link proposes to reimplement the function from the C source of float parsing... yeah, why not? – Jean-François Fabre May 21 '18 at 19:34
  • @Jean-FrançoisFabre. Probably. I'll post an answer to the other question if I find anything better... – Mad Physicist May 21 '18 at 19:35
  • Is it true that the last digit of the number part has to be one of [0-9]? If yes, could you just do a regex search for the first one digit number on the reverse string? – SpghttCd May 21 '18 at 19:37
  • check the duplicate link, I posted something that works, and learned ctypes wrapping in the process :) – Jean-François Fabre May 21 '18 at 19:58

1 Answers1

0

I know this is a stupid solution, but how about this:

def float_and_more(something):
    orig = something
    rest = ''
    while something:
        try:
            return float(something), rest                  
        except ValueError:
            rest = something[-1] + rest                    
            something = something[:-1]                     
    raise ValueError('Invalid value: {}'.format(orig))

And you could use it like this:

>>> float_and_more('2.5 meters')
(2.5, 'meters')

If you would want to use this for real, you'd probably use io.StringIO instead of constantly recreating the strings.

L3viathan
  • 26,748
  • 2
  • 58
  • 81