-3

I used to have strings like this:

233.43 USD
634,233 EURO

and I used to extract numbers from those strings using this:

def extractNumbersFromString(value): #This function is to get the numbers froma string
        return re.search('(\d+(?:[.,]\d*)*)', value).group(1)

Now I got strings like these as well:

2300 000 USD
430 000 EU

where there is a space between the numbers and the zeros on the right.

How can I adjust my code to extract the numbers from those strings?

Required output:

 2300000 
 430000 

My code currently gives me just this 2300 and 430 (i.e. without the zeros on the right).

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
Marco Dinatsoli
  • 10,322
  • 37
  • 139
  • 253

3 Answers3

1

You just need rsplit and to str.replace the spaces:

s="""233.43 USD
634,233 EURO
2300 000 USD
430 000 EU
"""


for line in s.splitlines():
    a,_=  line.rsplit(None, 1)
    print(a.replace(" ",""))


233.43
634,233
2300000
430000

Or using translate may be slightly faster:

for line in s.splitlines():
    a,_= line.rsplit(None, 1)
    print(a.translate(None," "))

If value is always a line from your input example:

def extractNumbersFromString(value):
    a, _= value.rsplit(None, 1)
    return a.translate(None," ")

Or use it with re:

def extractNumbersFromString(value): #This function is to get the numbers froma string
    return [a.translate(None," ") for a in re.findall('(\d+(?:[ .,]\d*)*)', value)]

You can also rstrip the letters:

from string import ascii_letters
for line in s.splitlines():
   print line.rstrip(ascii_letters).translate(None," ")
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
0

You could use the below regex.

>>> s = '''2300 000 USD
430 000 EU'''
>>> re.findall(r'\d+(?:[ ,.]\d+)*', s)
['2300 000', '430 000']
>>> [i.replace(' ','') for i in re.findall(r'\d+(?:[\s,.]\d+)*', s)]
['2300000', '430000']

Use replace function at the final to remove spaces in the numbers.

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

You can use str.translate with multiple deletions and no replacement characters:

txt='''\
233.43 USD
634,233 EURO
2300 000 USD
430 000 EU'''

import re

def extractNumbersFromString(value):    
    return re.search(r'^(\d+)', value.translate(None, " ,.")).group(1)      

for line in txt.splitlines():
    print "{:>20}    =>{:>10}".format(line, extractNumbersFromString(line))

Prints:

      233.43 USD    =>     23343
    634,233 EURO    =>    634233
    2300 000 USD    =>   2300000
      430 000 EU    =>    430000

If you know you are only interested in one grouping of digits per line, you can just filter out the non digit characters as well:

def extractNumbersFromString(value):      
    return filter(str.isdigit, value)   
dawg
  • 98,345
  • 23
  • 131
  • 206