0

I have a csv file with headers like:

Given this test.csv file:

"A","B","C","D","E","F","timestamp"
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486

If, I use load.txt then I get the array with 3 rows and 7 columns with exponential values.

r1 = numpy.loadtxt(open("test.csv","rb"),delimiter=",",skiprows=1)

I get

 [[  6.11882430e+02   9.08956010e+03   5.13300000e+03   8.64075140e+02
     1.71537476e+03   7.65227770e+02   1.29111196e+12]
  [  6.11882430e+02   9.08956010e+03   5.13300000e+03   8.64075140e+02
     1.71537476e+03   7.65227770e+02   1.29111311e+12]
  [  6.11882430e+02   9.08956010e+03   5.13300000e+03   8.64075140e+02
     1.71537476e+03   7.65227770e+02   1.29112065e+12]]

To avoid exponential I used the following code but still it gives the same exponential values. My code to avoid exponential:

 r1 = np.loadtxt(open("test.csv","rb"),delimiter=",", dtype=np.float64, skiprows=1)

Is there any way to remove the exponential while creating the numpy matrix? I know I can remove the values later with numpy.savetxt(sys.stdout, r1, '%5.2f') but I want it while creating the matrix not after creation.

user2481422
  • 868
  • 3
  • 17
  • 31
  • 611.88243 *is* 6.11882430e+02, given the issues associated with floating point arithmetic). Don't you want to read in the the values? Otherwise, what sort of result *are* you looking for? Also, what do you want to do with the last column of input, which is already in the exponential notation (1.291111964948E12)? – Joshua Taylor Jun 10 '14 at 17:20
  • 1
    Why does it matter? Is there any difference other than in how they are displayed? – Rob Watts Jun 10 '14 at 17:21
  • @JoshuaTaylor See my edit, there won't be any exponential value in csv file. – user2481422 Jun 10 '14 at 17:25
  • @RobWatts Yes, I want my matrix to look simple – user2481422 Jun 10 '14 at 17:25
  • 3
    @user2481422 Then it's just an issue of how you print it out - there is nothing wrong with the contents of your matrix. http://stackoverflow.com/questions/2891790/pretty-printing-of-numpy-array answers how to do that. – Rob Watts Jun 10 '14 at 17:30
  • @user2481422 I see your edit, but I don't see an example of the output that you want. – Joshua Taylor Jun 10 '14 at 17:47
  • @JoshuaTaylor I believe OP is saying even when there are no exponents in the input he still gets them when he prints out the array. – Rob Watts Jun 10 '14 at 17:53
  • I *think* that's the case, too, but I'm not sure. It really shouldn't be so hard though to copy the output that OP *is* getting, edit it in Notepad, and say, "this is what I'm trying to achieve." It would remove *a lot* of ambiguity. – Joshua Taylor Jun 10 '14 at 17:54
  • OK, is this *not* answered by the related question in the sidebar, [Pretty-printing of numpy.array](http://stackoverflow.com/q/2891790/1281433)? The asker there said, "If I want to print the numpy.array of floats, it prints several decimals, often in 'scientific' format, which is rather hard to read even for low-dimensional arrays. However, numpy.array apparently has to be printed as a string, i.e., with %s. Is there any solution ready for this purpose?" – Joshua Taylor Jun 10 '14 at 17:56
  • @user2481422: In the last column of the sample data, you've lost the scientific notation `E12`; compare to the data shown here: http://stackoverflow.com/questions/24143807/load-csv-file-to-numpy-and-access-columns-by-name – Warren Weckesser Jun 10 '14 at 18:30

1 Answers1

2

I hope the comments on the question make clear that this is purely a formatting question. Also pointed out in the comments, a nice explanation of some of the formatting options for numpy arrays is given by @unutbu here: How to pretty-printing a numpy.array without scientific notation and with given precision?

An option not shown in that answer is the use of the formatter argument to np.set_printoptions. The argument was added to set_printoptions in numpy version 1.7.0. With the formatter argument, you can control how numpy prints the elements of arrays. Here's an example of using that argument to control the format of floating point numbers.

Here's how a is printed with the default settings:

In [30]: a
Out[30]: 
array([[  6.11882430e+02,   9.08956010e+03,   5.13300000e+03,
          8.64075140e+02,   1.71537476e+03,   7.65227770e+02,
          1.29111196e+12],
       [  6.11882430e+02,   9.08956010e+03,   5.13300000e+03,
          8.64075140e+02,   1.71537476e+03,   7.65227770e+02,
          1.29111311e+12],
       [  6.11882430e+02,   9.08956010e+03,   5.13300000e+03,
          8.64075140e+02,   1.71537476e+03,   7.65227770e+02,
          1.29112065e+12]])

Now override the default, and tell numpy to convert floating point values to strings using the format "%.5f". This format will not use scientific notation, and it will always show five digits after the decimal point.

In [31]: np.set_printoptions(formatter={'float': lambda x: "%.5f" % (x,)})

In [32]: a
Out[32]: 
array([[611.88243, 9089.56010, 5133.00000, 864.07514, 1715.37476,
        765.22777, 1291111964948.00000],
       [611.88243, 9089.56010, 5133.00000, 864.07514, 1715.37476,
        765.22777, 1291113113366.00000],
       [611.88243, 9089.56010, 5133.00000, 864.07514, 1715.37476,
        765.22777, 1291120650486.00000]])

You can add a call to rstrip to remove the trailing zeros:

In [53]: np.set_printoptions(formatter={'float': lambda x: ("%.5f" % (x,)).rstrip('0')})

In [54]: a
Out[54]: 
array([[611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777,
        1291111964948.],
       [611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777,
        1291113113366.],
       [611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777,
        1291120650486.]])

Note that in the above, I entered the name in ipython, and it echoed back its value. When used this way, the repr-representation of the object is printed. You'll get the str-representation if you explicitly print it:

In [55]: print(a)
[[611.88243 9089.5601 5133. 864.07514 1715.37476 765.22777 1291111964948.]
 [611.88243 9089.5601 5133. 864.07514 1715.37476 765.22777 1291113113366.]
 [611.88243 9089.5601 5133. 864.07514 1715.37476 765.22777 1291120650486.]]
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • Your answer gives this error `TypeError: set_printoptions() got an unexpected keyword argument 'formatter'`. – user2481422 Jun 11 '14 at 07:16
  • @user2481422: The `formatter` argument was added in numpy 1.7.0 (https://github.com/numpy/numpy/blob/master/doc/release/1.7.0-notes.rst#custom-formatter-for-printing-arrays). What version are you using? – Warren Weckesser Jun 11 '14 at 17:19