0

Image Output

I'm trying to round float series pulled from yfinance to two decimal places. Despite using round(2) and astype('float32'), the dataframe refuses to keep the round and shows multiple decimal places for each series. What am I doing wrong?

def clean_df(df):
    return (df
    .stack(level=0)
    .swaplevel(0,1,axis=0)
    .sort_index(axis=0, level=None, ascending=True)
    .reset_index(level=[0,1])
    .rename(columns={"level_0": "Symbol"})
    .round({'Open': 2, 'High': 2,'Low': 2,'Close': 2,'Volume': 0})
    .astype({'Symbol':'string','Close':'float32','Open':'float32','High':'float32','Low':'float32','Volume':'int32'})
    )
phuclv
  • 37,963
  • 15
  • 156
  • 475
greenguy
  • 3
  • 3
  • 1
    The only way to guarantee display with a certain number of decimal places is to convert to a string, using e.g. one of the methods described [here](https://stackoverflow.com/q/19986662/9473764) – Nick May 27 '23 at 01:54
  • @Nick that's not the only way. You can use the Decimal module to handle decimal values https://beepscore.com/website/2018/10/12/using-pandas-with-python-decimal.html – phuclv May 27 '23 at 02:26
  • 2
    Float32 is the wrong type to use, it can only hold about 5 digits before inaccuracy sets in. – Mark Ransom May 27 '23 at 02:53
  • I'm using float32 to preserve memory. Stock prices should only have 2 decimal places so I'm fine with the inaccuracy on that front. – greenguy May 27 '23 at 04:54
  • @greenguy, rather than count money to the dollar, count to the 0.01, then all values are whole numbers. With intermediate calculations that might incur a small +/- fraction, round to the near whole number. _Display_ values with `x/100` to 2 places after the decimal point. – chux - Reinstate Monica May 30 '23 at 09:51

1 Answers1

0

I think you just need to change the float format display/option :

pd.set_option("display.float_format", lambda x: "%.2f" % x)

def clean_df(df):
    return (df
    .stack(level=1)
    .swaplevel(0,1,axis=0)
    .sort_index(axis=0, level=None, ascending=True)
    .reset_index(level=[0,1])
    .rename(columns={"level_0": "Symbol"})
    )

df = yf.download("AAPL XOM", start="1980-12-12", end="2023-05-27")

Output :

print(clean_df(df))

      Symbol       Date  Adj Close  Close   High    Low   Open     Volume
0       AAPL 1980-12-12       0.10   0.13   0.13   0.13   0.13  469033600
1       AAPL 1980-12-15       0.09   0.12   0.12   0.12   0.12  175884800
2       AAPL 1980-12-16       0.09   0.11   0.11   0.11   0.11  105728000
3       AAPL 1980-12-17       0.09   0.12   0.12   0.12   0.12   86441600
4       AAPL 1980-12-18       0.09   0.12   0.12   0.12   0.12   73449600
...      ...        ...        ...    ...    ...    ...    ...        ...
21403    XOM 2023-05-22     104.97 104.97 107.04 104.88 105.84   12882000
21404    XOM 2023-05-23     106.40 106.40 108.22 105.75 105.99   14394400
21405    XOM 2023-05-24     107.59 107.59 108.51 106.73 107.38   16340300
21406    XOM 2023-05-25     105.66 105.66 106.43 104.71 105.94   14316500
21407    XOM 2023-05-26     104.97 104.97 106.95 104.83 106.47   12367300

[21408 rows x 8 columns]
Timeless
  • 22,580
  • 4
  • 12
  • 30
  • This would just change the display of decimals though right? I want the numbers to be mutated. – greenguy May 27 '23 at 04:56
  • Then, add `.round(2)` as a last chain of the returned *DataFrame* in `clean_df`. – Timeless May 27 '23 at 04:59
  • For whatever reason that doesn't work either. it just doesnt change – greenguy May 27 '23 at 08:44
  • 3
    You can't create a float32 number like 106.73 exactly, because this number can't be represented as a float32. The closest is 106.73000335693359375, which is what you see. This is just how floating point numbers work. – Alexander Fasching May 27 '23 at 14:31
  • 1
    @greenguy Notice that 105.750000 displays exactly, because 0.75 is ¾ which is an exact power-of-two fraction. Notice that 106.730003 does not display exactly, because 0.73 is not an exact power-of-two fraction. See also [is floating point math broken?](https://stackoverflow.com/questions/588004) – Steve Summit May 27 '23 at 17:39
  • I'm referring to the rounding not working. I only want 2 decimal places in the data. step 1: convert to float 32 to reduce memory step 2: round or just remove trailing decimals past 2 to completey remove the extra numbers from my data frame. – greenguy May 28 '23 at 19:09