0

I have data from csv:

36849|17|4.7|20180118103240
36792|17|5.3|20180118103238

4.7 and 5.3 is float

But when I do like this:

scores_data_train = pd.read_csv('../Dataset/TrainData//u.score.csv', sep='|')
scores_train = scores_data_train.as_matrix()
print(scores_train[:1, :])

The result:

[[3.68490000e+04 1.70000000e+01 4.70000000e+00 2.01801181e+13]]

Please help me. Thank you

rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • Why are you using `as_matrix`? What's the intention here? It looks like the result is exactly correct. What is the problem? – rafaelc Apr 01 '19 at 16:02

2 Answers2

0

Please do the below settings after importing numpy. refer similar questions, please check

   import numpy as np  
   np.set_printoptions(suppress=True,
   formatter={'float_kind':'{:0.2f}'.format}) 

#float, 2 units #precision right, 0 on left

vrana95
  • 511
  • 2
  • 10
  • The pandas option will not affect the output format of the numpy array returned by `.as_matrix()` - after that point, it's basically out of pandas's hands. – Christoph Burschka Apr 01 '19 at 15:35
  • idea was to use it before converting to a matrix. Will that not work ? – vrana95 Apr 01 '19 at 15:56
  • Unfortunately not - the option only takes effect when floats inside a pandas dataframe are formatted to strings. Calling `.as_matrix()` will turn the dataframe into an array of floats, so pandas display formatting doesn't come into play. – Christoph Burschka Apr 01 '19 at 15:58
0

The as_matrix() method turns the dataframe into a numpy array, which is limited to a single datatype by definition. You can't have some of the elements treated as floats while others are integers.

As long as you do not call .as_matrix(), you will have a dataframe, which can have integer and float columns. The types for each column can be specified by calling pd.read_csv(..., dtype={"colname": "int", "colname2": "float"}).

Christoph Burschka
  • 4,467
  • 3
  • 16
  • 31