2

I'm trying to catch all numbers from a string using Python regex. By numbers I mean integers and floats (using , or .). I managed to get it done using this regex : ([0-9]+[\,|\.][0-9]+|[0-9]+)

But I have a problem, I need it to match big numbers with spaces in them. I mean 20 000 or 5 000 000. And these numbers can be very big with a lot of spaces. I don't know how much. But there will always be 1 space between numbers, no more. For example: 20 30 = this will be 2 differents numbers.

I guess I will need some sort of recursive pattern (?R), but I don't know how to use it.

Can someone help ? :)

cuzureau
  • 330
  • 2
  • 17

2 Answers2

2

You can use a pattern like

(?<!\d)(?<!\d[.,])\d{1,3}(?:\s\d{3})*(?:[,.]\d+)?

See the regex demo.

Details

  • (?<!\d)(?<!\d[.,]) - no digit or digit plus a comma or period immediately to the left of the current location are allowed
  • \d{1,3} - one, two or three digits
  • (?:\s\d{3})* - zero or more sequences of a whitespace and three digits
  • (?:[,.]\d+)? - an optional occurrence of a , or . and then one or more digits.

In Python, you can use re.findall:

import re
text = "5 000, 6 123 456,345 and 6 123 456.345... I mean 20 000 or 5 000 000. For example: 20    30"
print( re.findall(r'(?<!\d)(?<!\d[.,])\d{1,3}(?:\s\d{3})*(?:[,.]\d+)?', text) )
## => ['5 000', '6 123 456,345', '6 123 456.345', '20 000', '5 000 000', '20', '30']
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0
import re
number='20 300  4 100   400  50'
res=re.findall(r'(\d*\s*)',number)
res=''.join(res).split('  ')
print(list(map(lambda x: int(x.replace(' ','')),res)))

-output

[20300, 4100, 400, 50]

DevScheffer
  • 491
  • 4
  • 15
  • 1
    This doesn't work for the OP's case of multiple numbers separated by multiple spaces, e.g. `number='20 3000 400` should give `('20', '3000', '400')` but will give `'203000400'` – Jamie Deith Apr 24 '21 at 19:23
  • You'll need a further enhancement to cover the OP's case of a long number with digits separated by a single space and grouped in triples, e.g. `number= '20 300 4 100 400 50'` should give `['20 300', '4 100', '400', '50']` vs. your current answer that will give `['20', '300', '4', '100', '400', '50']`. (Although I suspect that `['20300', '4100', '400', '50']` would be acceptable too.) – Jamie Deith Apr 24 '21 at 20:38