I need to set the global float precision to the minimum value possible
Also, I need to get the precision for each column, in part to get the global precision and on the other hand, I would like to use as many decimal places as the user wants for each column.
I get the data from a CSV. In the beginning, I load all the cells as strings. After the conversion to numbers, the columns could have different dtypes.
In the integer columns (without '.') there are not any NaN values. So I thought I could make a copy of the dataframe when it contains strings and split the number by the '.' character. Because if the cells already have float numbers I could not get the number of decimal places because I could get something like this: 5.55 % 1 >> 0.550000000001
. I mean that sometimes python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine. Then I understand that is not possible to get the decimal values accurately.
There are no columns with all the values NaN
import pandas as pd
pd.set_option('precision', 15) # if > 15 the precision is not working well
df = pd.DataFrame({
'x':['5.111112222233', '5.111112222', '5.11111222223', '5.2227', '234', '4', '5.0'],
'y':['ÑKDFGÑKL', 'VBNVBN', 'GHJGHJ', 'GFGDF', 'SDFS', 'SDFASD', 'LKJ'],
'z':['5.0', '5.0', '5.0', '5.0', '3', '6', '5.0'],
'a':['5', '5', '5', '5', '3', '6', '9'],
'b':['5.0', '5.0', '5.0', '5.0', '3.8789', '6', np.nan],
})
df_str = df.copy(deep=True)
df = df.apply(lambda t: pd.to_numeric(t, errors='ignore', downcast='integer'))
precisions = {}
pd_precision = 0
# Float columns
for c in df.select_dtypes(include=['float64']):
p = int(df_str[c].str.rsplit(pat='.', n=1, expand=True)[1].str.len().max()) # always has one '.'
if p > pd_precision:
pd_precision = p
precisions[c] = p
# Integer columns
for c in df.select_dtypes(include=['int8', 'int16', 'int32', 'int64']):
precisions[c] = 0
# String and mixed columns
for c in df.select_dtypes(include=['object']): # or exclude=['int8', 'int16', 'int32', 'int64', 'float64']
precisions[c] = False
if pd_precision > 15:
pd_precision = 15
pd.set_option('precision', pd_precision) # pd_precision = 12
precisions # => {'x': 12, 'b': 4, 'z': 0, 'a': 0, 'y': False}
I know there is a Decimal class, but I believe I would lose all the benefits of performance of a pandas dataframes with floats.
Is there a better way to get the number of decimal places?