15

I'm calculating some standard deviations which are giving FloatingPointErrors. I wanted to try converting the data series to Decimal (using https://docs.python.org/3/library/decimal.html), to see if this fixes my issue.

I can't seem to make a pandas series of decimal.

How can I take a normal pd.Series of float64 and convert to a pd.Series of decimal, such that I can do:

Series.pct_change().ewm(span=35, min_periods=35).std()
ChesuCR
  • 9,352
  • 5
  • 51
  • 114
cjm2671
  • 18,348
  • 31
  • 102
  • 161

3 Answers3

4
from decimal import Decimal

df['col_a'] = df['col_a'].apply(lambda x: Decimal(str(x)))
David Wei
  • 118
  • 5
2

would something like this work?

def column_round(decimals):
     return partial(Series.round, decimals=decimals)

df.apply(column_round(2))

alternatively lets use np.vectorize so we can use decimal.quantize function to do rounding, this will leave the variable as a decimal instead of np.float64

npquantize = np.vectorize(decimal.Decimal.quantize)

I have been looking into it and this seems to solve the issue with pct_change

ts.diff().div(ts.shift(1))
SerialDev
  • 2,777
  • 20
  • 34
  • 2
    If I've understood correctly, this still uses floating point arithmetic; I want to enforce decimal arithmetic. – cjm2671 Jun 29 '16 at 09:10
  • have you considered converting the series into a numpy array and apply np.vectorize prior to applying todecimal? – SerialDev Jun 29 '16 at 09:20
1

I think you can create the DataFrame directly with Decimal types and operate with the values

import pandas as pd
import numpy as np
from decimal import *

df = pd.DataFrame({
    'DECIMAL_1': [Decimal('2342.2345234'), Decimal('564.5678'), Decimal('76867.8923892')],
    'DECIMAL_2': [Decimal('67867.43534534323'), Decimal('67876.345345'), Decimal('234234.2345345')]
})
df['DECIMAL_3'] = df['DECIMAL_1'] + df['DECIMAL_2']
df.dtypes

The drawback could be that the columns dtype is going to be object and the performance will decrease, I am afraid. Anyway, I think that any operation with the Decimal will require more computation than operating with floats.

Maybe the best solution is to have a copy of the DataFrame. One DF with floats and the other one with Decimal. If you need to make fast operations you can use the DF with floats, if you need to compare or assign new values to some cells with some specific precision you can use the DF created with Decimal.

Tell me what you think about my suggestions.

Note: I made my example with DataFrame, but a DataFrame is built with Series

ChesuCR
  • 9,352
  • 5
  • 51
  • 114