0

I am trying to find all numbers in text and return them in a list of floats.

In the text:

  • Commas are used to separate thousands
  • Several consecutive numbers are separated by a comma and a space
  • Numbers can be attached to words

My code seems to extract numbers separated with a comma and space and numbers attached to words. However, it extracts numbers separated by commas as separate numbers

text = "30feet is about 10metre but that's 1 rough estimate several numbers are like 2, 137, and 40 or something big numbers are like 2,137,040 or something"

list(map(int, re.findall('\d+', text)))

The suggestions below work beautifully

Unfortunately, the output of the below returns a string:

nums = re.findall(r'\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)', text)
print(nums)

I need to return the output as a list of floats, with commas between but no speech marks.

Eg. 
extract_numbers("1, 2, 3, un pasito pa'lante Maria")
    is [1.0, 2.0, 3.0]

Unfortunately, I have not yet been successful in my attempts. Currently, my code reads

def extract_numbers(text):
  nums = re.findall(r'\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)', text)
  
    return (("[{0}]".format( 
                       ', '.join(map(str, nums))))) 

extract_numbers(TEXT_SAMPLE)
  • 1
    Also, a dupe of [How to extract numbers from a string in Python?](https://stackoverflow.com/questions/4289331/how-to-extract-numbers-from-a-string-in-python). – Wiktor Stribiżew Sep 27 '21 at 21:31

4 Answers4

4

You may try doing a regex re.findall search on the following pattern:

\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)

Sample script - try it here

import re

text = "30feet is about 10metre but that's 1 rough estimate several numbers are like 2, 137, and 40 or something big numbers are like 2,137,040 or something"

nums = re.findall(r'\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)', text)
print(nums)

This prints:

['30', '10', '1', '2', '137', '40', '2,137,040']

Here is an explanation of the regex pattern:

\b            word boundary
\d{1,3}       match 1 to 3 leading digits
(?:,\d{3})*   followed by zero or more thousands terms
(?:\.\d+)?    match an optional decimal component
(?!\d)        assert the "end" of the number by checking for a following non digit
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
1

Create a pattern with an optional character group []

Code try it here

import re

text = "30feet is about 10metre but that's 1 rough estimate several numbers are like 2, 137, and 40 or something big numbers are like 2,137,040 or something"

out = [
    int(match.replace(',', ''))
    for match in re.findall('[\d,]+', text)
]
print(out)

Output

[30, 10, 1, 2, 137, 40, 2137040]
Nam G VU
  • 33,193
  • 69
  • 233
  • 372
RichieV
  • 5,103
  • 2
  • 11
  • 24
  • 1
    definitely the best approach. gonna save this code. thank u RichieV –  Sep 02 '20 at 22:38
  • A full-perfect regex match available here by Tim Biegeleisen https://stackoverflow.com/a/63714259/248616 – Nam G VU Sep 02 '20 at 23:19
  • 1
    @NamGVU thanks! His pattern does prevent unexpected use of commas and works also for floats... It was just not required in this case, and I answered because there were no answers at the time... nice link! I didn't know about repl.it – RichieV Sep 02 '20 at 23:28
  • Glad you like repl - it's a proof for working code in my view. – Nam G VU Sep 02 '20 at 23:36
  • This answer has a problem. The last match is supposed to be `2,137,040`, but your answer is finding `2137040`. Can you fix this? – Tim Biegeleisen Sep 03 '20 at 00:32
  • @TimBiegeleisen this answer returns integers, not strings, as it was obviously intended by the OP. The goal of my comments was not to pick a fight, your answer is clearly better. I just noticed that with a different case where the string ended in a digit your answer needed a tiny adjustment. – RichieV Sep 03 '20 at 02:09
  • If the OP really just wants integers, then this might be the best approach +1. – Tim Biegeleisen Sep 03 '20 at 02:18
  • @TimBiegeleisen well I do think that it is good to generalize the solutions, as you did, removing commas and mapping int is trivial, you obviously thought about what would be more useful. Hope there's no hard feelings! – RichieV Sep 03 '20 at 02:22
0

you need to match the commas as well, then strip them before turning them into an integer:

list(map(lambda n: int(n.replace(',','')), re.findall('[\d,]+', text)))

Also, you should probably be using list comprehensions unless you need python2 compatibility for some reason:

[int(n.replace(',', '')) for n in re.findall('[\d,]+', text)]
Drew Shafer
  • 4,740
  • 4
  • 30
  • 42
-1

y not use? array = re.findall(r'[0-9]+', str)