Are these large numbers realistic, and, if so how do you want to display them?
Copy and paste from your question:
In [1]: x=np.array([7.433148e+46,7.433148e+47])
The default numpy display adds a few decimal pts.
In [2]: x
Out[2]: array([ 7.43314800e+46, 7.43314800e+47])
changing precision doesn't change much
In [5]: np.set_printoptions(precision=6)
In [6]: np.set_printoptions(suppress=True)
In [7]: x
Out[7]: array([ 7.433148e+46, 7.433148e+47])
suppress
does less. It supresses small floating point values, not large ones
suppress : bool, optional
Whether or not suppress printing of small floating point values using
scientific notation (default False).
The default python display for one of these numbers - also scientific:
In [8]: x[0]
Out[8]: 7.4331480000000002e+46
With a formatting command we can display it in it's 46+ character glory (or gory detail):
In [9]: '%f'%x[0]
Out[9]: '74331480000000001782664341808476383296708673536.000000'
If that was a real value I'd prefer to see the scientific notation.
In [11]: '%.6g'%x[0]
Out[11]: '7.43315e+46'
To illustrate what suppress
does, print the inverse of this array:
In [12]: 1/x
Out[12]: array([ 0., 0.])
In [13]: np.set_printoptions(suppress=False)
In [14]: 1/x
Out[14]: array([ 1.345325e-47, 1.345325e-48])
===============
I'm not that familiar with pandas
, but I wonder if your mean
calculation makes sense. What does pandas
print for df.iloc[:,15]
? For the mean to be this large, the original data has to have values of similar size. How does the source display them? I wonder if most of your values are smaller, normal values, and your have a few excessively large ones (outliers) that 'distort' the mean.
I think you can simplify the array extraction with values
:
mean_vector = np.array(df.iloc[:,15],dtype=np.float64)
mean_vector = df.iloc[:,15].values