Convert all numeric strings with or without unit abbreviations. You must indicate that the source string is a decimal comma notation by parameter dec=',' Converting to floats as well as integer is possible. Default conversion is float, but set the parameter toInt=True and the result is an integer. Automatic recognition of unit abbreviations that can be edited in the md dictionary. The key is the unit abbreviation and the value is the multiplier. In this way, the applications of this function are endless. The result is always a number you can calculate with. This all in one function is not the fastest method, but you don't have to worry anymore and it always returns a reliable result.
import re
'''
units: gr=grams, K=thousands, M=millions, B=billions, ms=mili-seconds, mt= metric-tonnes
'''
md = {'gr': 0.001, '%': 0.01, 'K': 1000, 'M': 1000000, 'B': 1000000000, 'ms': 0.001, 'mt': 1000}
seps = {'.': True, ',': False}
kl = list(md.keys())
def to_Float_or_Int(strVal, toInt=None, dec=None):
toInt = False if toInt is None else toInt
dec = '.' if dec is None else dec
def chck_char_in_string(strVal):
rs = None
for el in kl:
if el in strVal:
rs = el
break
return rs
if dec in seps.keys():
dcp = seps[dec]
strVal = strVal.strip()
mpk = chck_char_in_string(strVal)
mp = 1 if mpk is None else md[mpk]
strVal = re.sub(r'[^\de.,-]+', '', strVal)
if dcp:
strVal = strVal.replace(',', '')
else:
strVal = strVal.replace('.', '')
strVal = strVal.replace(',', '.')
dcnm = float(strVal)
dcnm = dcnm * mp
dcnm = int(round(dcnm)) if toInt else dcnm
else:
print('wrong decimal separator')
dcnm = None
return dcnm
Call the function as follows:
pvals = ['-123,456', '-45,145.01 K', '753,159.456', '1,000,000', '985 ms' , '888 745.23', '1.753 e-04']
cvals = ['-123,456', '1,354852M', '+10.000,12 gr', '-87,24%', '10,2K', '985 ms', '(mt) 0,475', ' ,159']
print('decimal point strings')
for val in pvals:
result = to_Float_or_Int(val)
print(result)
print()
print('decimal comma strings')
for val in cvals:
result = to_Float_or_Int(val, dec=',')
print(result)
exit()
The output results:
decimal point strings
-123456.0
-45145010.0
753159.456
1000000.0
0.985
888745.23
0.0001753
decimal comma strings
-123.456
1354852.0
10.00012
-0.8724
10200.0
0.985
475.0
0.159