0

I made dataframe and set column names by using np.arange(). However instead of exact numbers it (sometimes) sets them to numbers like 0.300000004.

I tried both rounding entire dataframe and using np.around() on np.arange() output but none of these seems to work. I also tried to add these at the top:

np.set_printoptions(suppress=True)
np.set_printoptions(precision=3)

Here is return statement of my function:

stepT = 0.1
%net is some numpy array
return pd.DataFrame(net, columns = np.arange(0,1+stepT, stepT),
                    index = np.around(np.arange(0,1+stepS,stepS),decimals = 3)).round(3)

Is there any function that will allow me to have these names as numbers with only one digit after comma?

1 Answers1

0

The apparent imprecision of floating point numbers comes up often.

In [689]: np.arange(0,1+stepT, stepT)                                                                  
Out[689]: array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
In [690]: _.tolist()                                                                                   
Out[690]: 
[0.0,
 0.1,
 0.2,
 0.30000000000000004,
 0.4,
 0.5,
 0.6000000000000001,
 0.7000000000000001,
 0.8,
 0.9,
 1.0]
 In [691]: _689[3]                                                                                      
 Out[691]: 0.30000000000000004

The numpy print options control how the arrays are displayed. but they have no effect when individual values are printed.

When I make a dataframe with this column specification I get a nice display. (_689 is ipython shorthand for the Out[689] array.) It is using the array formatting:

In [699]: df = pd.DataFrame(np.arange(11)[None,:], columns=_689)                                       
In [700]: df                                                                                           
Out[700]: 
   0.0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1.0
0    0    1    2    3    4    5    6    7    8    9   10
In [701]: df.columns                                                                                   
Out[701]: 
Float64Index([                0.0,                 0.1,                 0.2,
              0.30000000000000004,                 0.4,                 0.5,
               0.6000000000000001,  0.7000000000000001,                 0.8,
                              0.9,                 1.0],
             dtype='float64')

But selecting columns with floats like this is tricky. Some work, some don't.

In [705]: df[0.4]                                                                                      
Out[705]: 
0    4
Name: 0.4, dtype: int64

In [707]: df[0.3]                                                                                      
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)

Looks like it's doing some sort of dictionary lookup. Floats don't work well for that, because of their inherent imprecision.

Doing an equality test on the arange:

In [710]: _689[3]==0.3                                                                                 
Out[710]: False
In [711]: _689[4]==0.4                                                                                 
Out[711]: True

I think you should create a list of properly formatted strings from the arange, and use that as column headers, not the floats themselves.

For example:

In [714]: alist = ['%.3f'%i for i in _689]                                                             
In [715]: alist                                                                                        
Out[715]: 
['0.000',
 '0.100',
 '0.200',
 '0.300',
 '0.400',
 '0.500',
 '0.600',
 '0.700',
 '0.800',
 '0.900',
 '1.000']
In [716]: df = pd.DataFrame(np.arange(11)[None,:], columns=alist)                                      
In [717]: df                                                                                           
Out[717]: 
   0.000  0.100  0.200  0.300  0.400  0.500  0.600  0.700  0.800  0.900  1.000
0      0      1      2      3      4      5      6      7      8      9     10
In [718]: df.columns                                                                                   
Out[718]: 
Index(['0.000', '0.100', '0.200', '0.300', '0.400', '0.500', '0.600', '0.700',
       '0.800', '0.900', '1.000'],
      dtype='object')
In [719]: df['0.300']                                                                                  
Out[719]: 
0    3
Name: 0.300, dtype: int64
hpaulj
  • 221,503
  • 14
  • 230
  • 353