7

I am working on converting parts of a C++ program to Python, but I have some trouble replacing the C function strtod. The strings I'm working on consists of simple mathmatical-ish equations, such as "KM/1000.0". The problem is that the both constants and numbers are mixed and I'm therefore unable to use float().

How can a Python function be written to simulate strtod which returns both the converted number and the position of the next character?

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
Waws
  • 203
  • 1
  • 2
  • 15

4 Answers4

4

I'm not aware of any existing functions that would do that.

However, it's pretty easy to write one using regular expressions:

import re

# returns (float,endpos)
def strtod(s, pos):
  m = re.match(r'[+-]?\d*[.]?\d*(?:[eE][+-]?\d+)?', s[pos:])
  if m.group(0) == '': raise ValueError('bad float: %s' % s[pos:])
  return float(m.group(0)), pos + m.end()

print strtod('(a+2.0)/1e-1', 3)
print strtod('(a+2.0)/1e-1', 8)

A better overall approach might be to build a lexical scanner that would tokenize the expression first, and then work with a sequence of tokens rather than directly with the string (or indeed go the whole hog and build a yacc-style parser).

NPE
  • 486,780
  • 108
  • 951
  • 1,012
2

You can create a simple C strtod wrapper:

#include <stdlib.h>

double strtod_wrap(const char *nptr, char **endptr)
{
   return strtod(nptr, endptr);
}

compile with:

gcc -fPIC -shared -o libstrtod.dll strtod.c

(if you're using Python 64 bit, the compiler must be 64-bit as well)

and call it using ctypes from python (linux: change .dll to .so in the lib target and in the code below, this was tested on Windows):

import ctypes

_strtod = ctypes.CDLL('libstrtod.dll')
_strtod.strtod_wrap.argtypes = (ctypes.c_char_p, ctypes.POINTER(ctypes.c_char_p))
_strtod.strtod_wrap.restype = ctypes.c_double

def strtod(s):
    p = ctypes.c_char_p(0)
    s = ctypes.create_string_buffer(s.encode('utf-8'))
    result = _strtod.strtod_wrap(s, ctypes.byref(p))
    return result,ctypes.string_at(p)

print(strtod("12.5hello"))

prints:

(12.5, b'hello')

(It's not as hard as it seems, since I learned how to do that just 10 minutes ago)

Useful Q&As about ctypes

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • 1
    Creating the wrapper seems unnecessary; you should be able to do this with `strtod` directly. – user2357112 May 21 '18 at 20:10
  • that would be even better. I have to test that first :) – Jean-François Fabre May 21 '18 at 20:19
  • You should be able to load `strtod` from a platform-specific existing shared library file. `ctypes.cdll.msvcrt` should work on Windows. I believe it's commonly `cdtypes.CDLL('libc.so.6')` on Linux, but I don't know how universal that is. It's probably also possible to compile your own file to access `strtod` from, though I'm not sure what the details of that would look like. (`#include ` on its own seems like it might work.) – user2357112 May 21 '18 at 20:34
  • I have tried that single stdlib.h include alone in the C file and it seems that the `strtod` symbol isn't linked so it doesn't work (python cannot find it). Sticking to the empty wrapper for now. It's working, and it's portable at source level apart from the .dll/.so part. As stated in the answer, I'm not a ctypes specialist. Just made it work (and was impressed by the simplicity of the python code). – Jean-François Fabre May 21 '18 at 20:49
0

I'd use a regular expression for this:

import re
mystring = "1.3 times 456.789 equals 593.8257 (or 5.93E2)"
def findfloats(s):
    regex = re.compile(r"[+-]?\b\d+(?:\.\d+)?(?:e[+-]?\d+)?\b", re.I)
    for match in regex.finditer(mystring):
        yield (match.group(), match.start(), match.end())

This finds all floating point numbers in the string and returns them together with their positions.

>>> for item in findfloats(mystring):
...     print(item)
...
('1.3', 0, 3)
('456.789', 10, 17)
('593.8257', 25, 33)
('5.93E2', 38, 44)
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • I can think of a bunch of valid floats that wouldn't get picked up. – NPE Sep 27 '11 at 06:27
  • The regex assumes an integer part. Everything else is optional. If there is a decimal point, a fractional part is required. So `.1` and `1.` won't be picked up. Of course it's trivial to modify the regex if necessary. – Tim Pietzcker Sep 27 '11 at 06:33
0

parse the number yourself.

a recursive-descent parser is very easy for this kind of input. first write a grammar:

float ::= ipart ('.' fpart)* ('e' exp)*
ipart ::= digit+
fpart ::= digit+
exp   ::= ('+'|'-') digit+
digit = ['0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9']

now converting this grammar to a function should be straightforward...

Adrien Plisson
  • 22,486
  • 6
  • 42
  • 73
  • There should be a `('+'|'-')` before `ipart` in the definition of `float` – Mad Physicist May 21 '18 at 20:20
  • @madphysicist it depends on the context. When parsing single standalone numbers, indeed you need to parse the leading sign. when parsing a numerical expression, you avoid including the sign because it would allow strange expression like "42-+37.2" (I seem to remember that I copied this grammar from the grammar of a well known language) – Adrien Plisson May 23 '18 at 06:08
  • `42-+37.2` seems like a reasonable expression to me. – Mad Physicist May 23 '18 at 06:32
  • although it is reasonable to any math-inclined human being, such grammar implies that you can also write `42--37.2`, which confuses a C or C++ parser (but strangely C++ accepts `42-+37.2`). As such, many (most) programming languages treats a leading sign as an unary operator, that is, an entity clearly separated from the following number. and some languages do not allow a unary operator anywhere else than the start of an expression. Anyway, for simple parsing of standalone numbers, the grammar above is indeed missing those unary operators. – Adrien Plisson May 23 '18 at 13:48