OK, in a regular Python session (I usually use Ipython instead), I set the print options, and made a large array:
>>> np.set_printoptions(threshold=np.inf, suppress=True)
>>> x=np.random.rand(25000,5)
When I execute the next line, it spends about 21 seconds formatting the array, and then writes the resulting string to the screen (with more lines than fit the terminal's window buffer).
>>> x
This is the same as
>>> print(repr(x))
The internal storage for x
is a buffer of floats (which you can 'see' with x.tostring()
. To print x
it has to format it, create a multiline string that contains a print representation of each number, all 125000 of them. The result of repr(x)
is a string 1850000 char long, 25000 lines. This is what takes 21 seconds. Displaying that on the screen is just limited by the terminal scroll speed.
I haven't looked at the details, but I think the numpy formatting is mostly written in Python, not compiled. It's designed more for flexibility than speed. It's normal to want to see 10-100 lines of an array. 25000 lines is an unusual case.
Somewhat curiously, writing this array as a csv is fast, with a minimal delay:
>>> np.savetxt('test.txt', x, fmt='%10f', delimiter=',')
And I know what savetxt
does - it iterates on rows, and does a file write
f.write(fmt % tuple(row))
Evidently all the bells-n-whistles of the regular repr
are expensive. It can summarize, it can handle many dimensions, it can handle complicated dtypes, etc. Simply formatting each row with a known fixed format is not the time consuming step.
Actually that savetxt
route might be more useful, as well as fast. You can control the display format, and you can view the resulting text file in an editor or terminal window at your leisure. You won't be limited by the scroll buffer of your terminal window. But how will this savetxt
file be different from the original csv
?