0

I have a numpy structured array that looks like this:

>>> arr
array([(b'00:59:59.785634', 60.87), (b'01:00:00.187634', 60.88),
    (b'01:00:00.188634', 60.88), ...,
    (b'23:59:58.668559', 60.93), (b'23:59:58.668559', 60.92),
    (b'23:59:58.668559', 60.93)],
    dtype=[('Date', 'S15'), ('Value', '<f4')])

When I convert it to a pandas dataframe, The values of Value column change their form.

>>> df = pd.DataFrame( arr )
                    Date      Value  
0       b'00:59:59.785634'  60.869999     
1       b'01:00:00.187634'  60.880001    
2       b'01:00:00.188634'  60.880001     
3       b'01:00:00.189634'  60.860001    
4       b'01:00:00.190634'  60.860001  

>>> df.Value
Name: Value, Length: 176195, dtype: float32    

It's still OK, because it prints out the same value as the array.

>>> str( df['Value'][0] )
'60.87'

But a problem occurs when I have done modifying the dataframe and try to convert the dataframe to an array again.

>>> new_arr = df.values
array([[b'00:59:59.785634', 60.869998931884766],
    [b'01:00:00.187634', 60.880001068115234],
    [b'01:00:00.188634', 60.880001068115234],
    ...,
    [b'23:59:58.668559', 60.93000030517578],
    [b'23:59:58.668559', 60.91999816894531],
    [b'23:59:58.668559', 60.93000030517578]], dtype=object)

>>> str( new_arr[0][1] )
'60.869998931884766'        # != '60.87'

I think the original dtype(float32) is lost during the last conversion. How can I still get '60.87' even after converting the dataframe to an array with .values attribute?

My question is why it prints out different results('60.87' and '60.869998931884766') and how to keep the type. I think if the values can be the same when converting an array to a dataframe, there should be some ways to preserve the values when converting them the other way around.

maynull
  • 1,936
  • 4
  • 26
  • 46
  • Floats are not stored as base-10 decimals. Therefore, a print representation may not match internal representation. – jpp Mar 17 '18 at 13:54
  • When I try this `df.dtypes` shows an `object` Date column and `float32` Value column. `df.values` is object dtype array. The elements are base Python strings and floats. Display of Python scalars is different from display for arrays. Compare `df.values[:,1]` with `df.values[:,1].astype('float32')` – hpaulj Mar 17 '18 at 16:13
  • Also look at `df['Value'].values`. – hpaulj Mar 17 '18 at 16:15

0 Answers0