2

I need to set the global float precision to the minimum value possible

Also, I need to get the precision for each column, in part to get the global precision and on the other hand, I would like to use as many decimal places as the user wants for each column.

I get the data from a CSV. In the beginning, I load all the cells as strings. After the conversion to numbers, the columns could have different dtypes.

In the integer columns (without '.') there are not any NaN values. So I thought I could make a copy of the dataframe when it contains strings and split the number by the '.' character. Because if the cells already have float numbers I could not get the number of decimal places because I could get something like this: 5.55 % 1 >> 0.550000000001. I mean that sometimes python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine. Then I understand that is not possible to get the decimal values accurately.

There are no columns with all the values NaN

import pandas as pd

pd.set_option('precision', 15)  # if > 15 the precision is not working well

df = pd.DataFrame({
    'x':['5.111112222233', '5.111112222', '5.11111222223', '5.2227', '234', '4', '5.0'],
    'y':['ÑKDFGÑKL', 'VBNVBN', 'GHJGHJ', 'GFGDF', 'SDFS', 'SDFASD', 'LKJ'],
    'z':['5.0', '5.0', '5.0', '5.0', '3', '6', '5.0'],
    'a':['5', '5', '5', '5', '3', '6', '9'],
    'b':['5.0', '5.0', '5.0', '5.0', '3.8789', '6', np.nan],
})

df_str = df.copy(deep=True)
df = df.apply(lambda t: pd.to_numeric(t, errors='ignore', downcast='integer'))

precisions = {}
pd_precision = 0

# Float columns
for c in df.select_dtypes(include=['float64']):
    p = int(df_str[c].str.rsplit(pat='.', n=1, expand=True)[1].str.len().max())  # always has one '.'
    if p > pd_precision:
        pd_precision = p
    precisions[c] = p

# Integer columns
for c in df.select_dtypes(include=['int8', 'int16', 'int32', 'int64']):
    precisions[c] = 0

# String and mixed columns
for c in df.select_dtypes(include=['object']):  # or exclude=['int8', 'int16', 'int32', 'int64', 'float64']
    precisions[c] = False

if pd_precision > 15:
    pd_precision = 15

pd.set_option('precision', pd_precision)  # pd_precision = 12
precisions  # => {'x': 12, 'b': 4, 'z': 0, 'a': 0, 'y': False}

I know there is a Decimal class, but I believe I would lose all the benefits of performance of a pandas dataframes with floats.

Is there a better way to get the number of decimal places?

ChesuCR
  • 9,352
  • 5
  • 51
  • 114
  • _in part to get the global precision and on the other hand, I would like to use as many decimal places as the user wants for each column_ I'm curious, why is that? _Because if the cells already have float numbers I could not get the number of decimal places because I could get something like this: `5.55 % 1 >> 0.550000000001`_ I'm not sure I understand what you mean, what is the issue? – AMC Apr 27 '20 at 01:58
  • @AMC thanks for your interest. I mean that sometimes python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine. Then I found that is not possible to get the decimal values accurately. I may achieve that just with the "Decimal" module. And I was wondering if Pandas had something builtin to get this number of decimals places without using that module – ChesuCR Apr 27 '20 at 09:48
  • @AMC and for the firts question... I need to make all the operations in my application with the maximum precision. And I want to inform the user how many decimal places has the current dataframe, each column separately (precision that I only get at the beginning when the csv is loaded). With that information the user will chose to export each column with more or with less decimal places – ChesuCR Apr 27 '20 at 10:04
  • 1
    Count the decimal places in the strings. Do not convert them to floating-point numbers. – Eric Postpischil Apr 27 '20 at 10:21
  • _With that information the user will chose to export each column with more or with less decimal places_ Is it purely for cosmetic/stylistic reasons? – AMC Apr 29 '20 at 01:56
  • @amc what matters is that I need to **show to the user the current precision by column** which is being used in the original csv file. Then, the user can change it in order to export it with the same or other precision, this is the easy part because I just need to `round` the column with that precision – ChesuCR Apr 29 '20 at 14:40
  • So I am afraid the only way is to check how many decimals places there are in the string of each cell, as I have already done – ChesuCR Apr 29 '20 at 14:42

0 Answers0