5

question: is my method of converting a numpy array of numbers to a numpy array of strings with specific number of decimal places AND trailing zeros removed the 'best' way?

import numpy as np
x = np.array([1.12345, 1.2, 0.1, 0, 1.230000])
print np.core.defchararray.rstrip(np.char.mod('%.4f', x), '0')

outputs:

['1.1235' '1.2' '0.1' '0.' '1.23']

which is the desired result. (I am OK with the rounding issue)

Both of the functions 'rstrip' and 'mod' are numpy functions which means this is fast but is there a way to accomplish this with ONE built in numpy function? (ie. does 'mod' have an option that I couldn't find?) It would save the overhead of returning copies twice which for very large arrays is slow-ish.

thanks!

user1269942
  • 3,772
  • 23
  • 33
  • 1
    why don't you just use `print np.char.mod('%0.4f', x)`? – Dalek Aug 14 '14 at 19:32
  • @Dalek because that would not remove trailing zeros. The reason I want to remove the zeros is it will make my files smaller. I am manually creating some ascii GIS rasters and would prefer to keep the large files as small as possible. Speed-wise, the additional operation to remove trailing zeros is not a bid deal so I consider it worth it for the gain of having smaller files. It would be fine for a few files to be larger than needed but I'm planning to do some quite large scale stuff...it'll add up. So, I am OK with the speed of what I use, but I am curious if anyone has a slicker way. – user1269942 Aug 14 '14 at 20:14
  • If you are OK with 5 "signifcant digits" instead of 4 decimal places, you could use `np.char.mod("%.5g", x)`. – Warren Weckesser Aug 14 '14 at 20:28
  • Are you creating the files with `np.savetxt`? – Warren Weckesser Aug 14 '14 at 20:33
  • @WarrenWeckesser No, I need to have some headers at the top of the file for GIS. – user1269942 Aug 14 '14 at 20:39
  • savetxt could be good if one of the 2 features existed: 1) can capture the output of the savetxt command so I can output it to my file of choice. 2) accept a file handle, not just a filename. this would allow me to write my headers and then call savetxt to do the rest. – user1269942 Aug 14 '14 at 20:42
  • 3
    What version of numpy are you using? In the latest version of numpy, `savetxt` accepts a file handle: http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html – Warren Weckesser Aug 14 '14 at 20:45
  • 1
    Related: http://stackoverflow.com/questions/24691755/how-to-format-in-numpy-savetxt-such-that-zeros-are-saved-only-as-0/ – Warren Weckesser Aug 14 '14 at 20:53
  • @WarrenWeckesser you are correct "filename or file handle". I totally missed that when I scanned the docs. I'm curious if it's faster...I will try and report back. – user1269942 Aug 14 '14 at 22:07

1 Answers1

2

Thanks to Warren Weckesser for providing valuable comments. Credit to him.

I converted my code to use:

formatter = '%d'
if num_type == 'float':
  formatter = '%%.%df' % decimals
np.savetxt(out, arr, fmt=formatter)

where out is a file handle to which I had already written my headers. Alternatively, I could also use the headers= argument in np.savetxt. I have no clue how I didn't see those options in the documentation.

For a numpy array 1300 by 1300, creating the line by line output as I did before (using np.core.defchararray.rstrip(np.char.mod('%.4f', x), '0')) took ~1.7 seconds and using np.savetxt takes 0.48 seconds.

So np.savetxt is a cleaner, more readable, and faster solution.

Note: I did try:

np.savetxt(out, arr, fmt='%.4g')

in an effort to not have a switch based on number type but it did not work as I had hoped.

user1269942
  • 3,772
  • 23
  • 33