Find all floats or ints in a given string

Question

Given a string, "Hello4.2this.is random 24 text42", I want to return all ints or floats, [4.2, 24, 42]. All the other questions have solutions that return just 24. I want to return a float even if non-digit characters are next to the number. Since I am new to Python, I am trying to avoid regex or other complicated imports. I have no idea how to start. Please help. Here are some research attempts: Python: Extract numbers from a string, this didn't work since it doesn't recognize 4.2 and 42. There are other questions like the one mentioned, none of which sadly recognize 4.2 and 42.

You're not going to do this well without re. Use regular expressions: they exist for this task. — Alex Huszagh, Jul 09 '17 at 22:51
@AlexanderHuszagh: *"You're not going to do this well without re."* Well, that sounds like a challenge... — Warren Weckesser, Jul 09 '17 at 22:54
@WarrenWeckesser, they key word is *well*. It's definitely doable, but it won't be efficient, readable, or likely performant without re. — Alex Huszagh, Jul 09 '17 at 22:59
After looking through the re module, I just realized that re was created to do these sorts of stuff. This just makes me realize how much longer I have on my journey to mastery of basic Python. — Nairit, Jul 10 '17 at 03:38

pythad · Accepted Answer · 2017-07-10T09:54:37.650

10

A regex from perldoc perlretut:

import re
re_float = re.compile("""(?x)
   ^
      [+-]?\ *      # first, match an optional sign *and space*
      (             # then match integers or f.p. mantissas:
          \d+       # start out with a ...
          (
              \.\d* # mantissa of the form a.b or a.
          )?        # ? takes care of integers of the form a
         |\.\d+     # mantissa of the form .b
      )
      ([eE][+-]?\d+)?  # finally, optionally match an exponent
   $""")
m = re_float.match("4.5")
print m.group(0)
# -> 4.5

To get all numbers from a string:

str = "4.5 foo 123 abc .123"
print re.findall(r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?", str)
# -> ['4.5', ' 123', ' .123']

edited Jul 10 '17 at 09:54

answered Jul 09 '17 at 22:54

pythad

4,241
2
19
41

This is leagues better than mine. +1 – cs95 Jul 09 '17 at 22:57
1

+1 I don't know regex, but I can intuitively understand why this makes sense. Also, is there a specific reason you used triple strings on the second snippet? – Nairit Jul 10 '17 at 02:43

Warren Weckesser · Answer 2 · 2017-07-10T02:34:04.080

Using regular expressions is likely to give you the most concise code for this problem. It is hard to beat the conciseness of

re.findall(r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?", str)

from pythad's answer.

However, you say "I am trying to avoid regex", so here's a solution that does not use regular expressions. It is obviously a bit longer than a solution using a regular expression (and probably much slower), but it is not complicated.

The code loops through the input character by character. As it pulls each character from the string, it appends it to current (a string that holds the number currently being parsed) if appending it still maintains a valid number. When it encounters a character that cannot be appended to current, current is saved to a list of numbers, but only if current itself isn't one of '', '.', '-' or '-.'; these are strings that could potentially begin a number but are not themselves valid numbers.

When current is saved, a trailing 'e', 'e-' or 'e+' is removed. That will happen with a string such as '1.23eA'. While parsing that string, current will eventually become '1.23e', but then 'A' is encountered, which means the string does not contain a valid exponential part, so the 'e' is discarded.

After saving current, it is reset. Usually current is reset to '', but when the character that triggered current to be saved was '.' or '-', current is set to that character, because those characters could be the beginning of a new number.

Here's the function extract_numbers(s). The line before return numbers converts the list of strings to a list of integers and floating point values. If you want just the strings, remove that line.

def extract_numbers(s):
    """
    Extract numbers from a string.

    Examples
    --------
    >>> extract_numbers("Hello4.2this.is random 24 text42")
    [4.2, 24, 42]

    >>> extract_numbers("2.3+45-99")
    [2.3, 45, -99]

    >>> extract_numbers("Avogadro's number, 6.022e23, is greater than 1 million.")
    [6.022e+23, 1]
    """
    numbers = []
    current = ''
    for c in s.lower() + '!':
        if (c.isdigit() or
            (c == 'e' and ('e' not in current) and (current not in ['', '.', '-', '-.'])) or
            (c == '.' and ('e' not in current) and ('.' not in current)) or
            (c == '+' and current.endswith('e')) or
            (c == '-' and ((current == '') or current.endswith('e')))):
            current += c
        else:
            if current not in ['', '.', '-', '-.']:
                if current.endswith('e'):
                    current = current[:-1]
                elif current.endswith('e-') or current.endswith('e+'):
                    current = current[:-2]
                numbers.append(current)
            if c == '.' or c == '-':
                current = c
            else:
                current = ''

    # Convert from strings to actual python numbers.
    numbers = [float(t) if ('.' in t or 'e' in t) else int(t) for t in numbers]

    return numbers

+1 Thanks! Your code is wonderful. I mainly asked for a solution without regex so I could understand the logic behind real coding -- and because as far as I know JS and C don't have regex. — Nairit, Jul 10 '17 at 03:39

score 0 · Answer 3 · answered Oct 29 '21 at 16:11

If you want to get integers or floats from a string, follow the pythad's ways...

If you want to get both integers and floats from a single string, do this:

string = "These are floats: 10.5, 2.8, 0.5; and these are integers: 2, 1000, 1975, 308 !! :D"

for line in string:
    for actualValue in line.split():
        value = []

            if "." in actualValue:
                value = re.findall('\d+\.\d+', actualValue)
            else:
                value = re.findall('\d+', actualValue)
                
            numbers += value

Find all floats or ints in a given string

3 Answers3

Linked