2

My goal is to convert my data into numpy array while preserving the number formats in the original list, clear and proper.


for example, this is my data in list format:

[[24.589888563639835, 13.899891781550952, 4478597, -1], [26.822224204095697, 14.670531752529088, 4644503, -1], [51.450405486761866, 54.770422572665254, 5570870, 0], [44.979065080591504, 54.998835550128852, 6500333, 0], [44.866399274880663, 55.757240813761534, 6513301, 0], [45.535380533604247, 57.790074517001365, 6593281, 0], [44.850372630818214, 54.720574554485822, 6605483, 0], [51.32738085400576, 55.118344981379266, 6641841, 0]]

when i do convert it to numpy array,

data = np.asarray(data)

i get mathematical notation e, how can I conserve the same format in my output array?

[[  2.45898886e+01   1.38998918e+01   4.47859700e+06  -1.00000000e+00]
 [  2.68222242e+01   1.46705318e+01   4.64450300e+06  -1.00000000e+00]
 [  5.14504055e+01   5.47704226e+01   5.57087000e+06   0.00000000e+00]
 [  4.49790651e+01   5.49988356e+01   6.50033300e+06   0.00000000e+00]
 [  4.48663993e+01   5.57572408e+01   6.51330100e+06   0.00000000e+00]
 [  4.55353805e+01   5.77900745e+01   6.59328100e+06   0.00000000e+00]
 [  4.48503726e+01   5.47205746e+01   6.60548300e+06   0.00000000e+00]
 [  5.13273809e+01   5.51183450e+01   6.64184100e+06   0.00000000e+00]]

update:

I did :

np.set_printoptions(precision=6,suppress=True)

but I still get different numbers when I pass some part of data to another variable and then look inside it, and i see that the decimals have changed! Why is it internally changing the decimals, why can't it just hold them as it is?

jacky
  • 524
  • 1
  • 5
  • 15
  • Possible duplicate of [How to pretty-printing a numpy.array without scientific notation and with given precision?](http://stackoverflow.com/questions/2891790/how-to-pretty-printing-a-numpy-array-without-scientific-notation-and-with-given) – Felix Apr 06 '17 at 10:17
  • @Felix i have edited my question and it is not the same. please revise your decision; – jacky Apr 06 '17 at 12:12
  • Floats in numpy arrays (as you have already noticed) are displayed differently that regular floats and you can't expect them to look the same. So the most sensible option for you is to use numpy's options for number formatting, as I have suggested before. – Felix Apr 06 '17 at 12:26
  • BTW, it's important not to mix a number's formatting (how the number is displayed) and its content (as it is stored in memory). Two numbers may look different when you print them, but are the same in memory. – Felix Apr 06 '17 at 12:29
  • You have to create a structured array to store mixed dtypes. – hpaulj Apr 06 '17 at 15:10
  • If you need an array of heterogenous type, you should probably not be using bare numpy in the first place. – DSM Apr 06 '17 at 16:32

1 Answers1

1

Simple array creation from the nested list:

In [133]: data = np.array(alist)
In [136]: data.shape
Out[136]: (8, 4)
In [137]: data.dtype
Out[137]: dtype('float64')

This is a 2d array, 8 'rows', 4 'columns'; all elements are stored as float.

The list can be loaded into a structured array, that is defined to have a mix of float and integer fields. Note that I have to convert the 'rows' to tuples for this load.

In [139]: dt = np.dtype('f,f,i,i')
In [140]: dt
Out[140]: dtype([('f0', '<f4'), ('f1', '<f4'), ('f2', '<i4'), ('f3', '<i4')])
In [141]: data = np.array([tuple(row) for row in alist], dtype=dt)
In [142]: data.shape
Out[142]: (8,)
In [143]: data
Out[143]: 
array([( 24.58988762,  13.89989185, 4478597, -1),
       ( 26.82222366,  14.67053223, 4644503, -1),
       ( 51.45040512,  54.77042389, 5570870,  0),
       ( 44.97906494,  54.99883652, 6500333,  0),
       ( 44.86639786,  55.7572403 , 6513301,  0),
       ( 45.53538132,  57.79007339, 6593281,  0),
       ( 44.85037231,  54.72057343, 6605483,  0),
       ( 51.32738113,  55.11834335, 6641841,  0)], 
      dtype=[('f0', '<f4'), ('f1', '<f4'), ('f2', '<i4'), ('f3', '<i4')])

You access fields by name, not column number:

In [144]: data['f0']
Out[144]: 
array([ 24.58988762,  26.82222366,  51.45040512,  44.97906494,
        44.86639786,  45.53538132,  44.85037231,  51.32738113], dtype=float32)
In [145]: data['f3']
Out[145]: array([-1, -1,  0,  0,  0,  0,  0,  0], dtype=int32)

Compare those values with the display of single columns from the 2d float array:

In [146]: dataf = np.array(alist)
In [147]: dataf[:,0]
Out[147]: 
array([ 24.58988856,  26.8222242 ,  51.45040549,  44.97906508,
        44.86639927,  45.53538053,  44.85037263,  51.32738085])
In [148]: dataf[:,3]
Out[148]: array([-1., -1.,  0.,  0.,  0.,  0.,  0.,  0.])

The use of a structured array makes more sense when there's a mix of floats, int, strings or other dtypes.

But to back up a bit - what is wrong with the pure float version? Why is important to retain the integer identity of 2 columns?

hpaulj
  • 221,503
  • 14
  • 230
  • 353