Speed of writing a numpy array to a text file

Question

I need to write a very "high" two-column array to a text file and it is very slow. I find that if I reshape the array to a wider one, the writing speed is much quicker. For example

import time
import numpy as np
dataMat1 = np.random.rand(1000,1000)
dataMat2 = np.random.rand(2,500000)
dataMat3 = np.random.rand(500000,2)
start = time.perf_counter()
with open('test1.txt','w') as f:
    np.savetxt(f,dataMat1,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('test2.txt','w') as f:
    np.savetxt(f,dataMat2,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('test3.txt','w') as f:
    np.savetxt(f,dataMat3,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

With the same number of elements in the three data matrixes, why is the last one much more time-consuming than the other two? Is there any way to speed up the writing of a "high" data array?

You may want to check [this post](https://stackoverflow.com/questions/30329726/fastest-save-and-load-options-for-a-numpy-array) for efficient array I/O — Tarifazo, Dec 17 '18 at 19:35
Unfortunately, I need to write it as text rather than a binary file. — Sheldon, Dec 17 '18 at 22:05

unutbu · Accepted Answer · 2018-12-17T20:01:51.173

As hpaulj pointed out, savetxt is looping through the rows of X and formatting each row individually:

for row in X:
    try:
        v = format % tuple(row) + newline
    except TypeError:
        raise TypeError("Mismatch between array dtype ('%s') and "
                        "format specifier ('%s')"
                        % (str(X.dtype), format))
    fh.write(v)

I think the main time-killer here is all the string interpolation calls. If we pack all the string interpolation into one call, things go much faster:

with open('/tmp/test4.txt','w') as f:
    fmt = ' '.join(['%g']*dataMat3.shape[1])
    fmt = '\n'.join([fmt]*dataMat3.shape[0])
    data = fmt % tuple(dataMat3.ravel())
    f.write(data)

import io
import time
import numpy as np

dataMat1 = np.random.rand(1000,1000)
dataMat2 = np.random.rand(2,500000)
dataMat3 = np.random.rand(500000,2)
start = time.perf_counter()
with open('/tmp/test1.txt','w') as f:
    np.savetxt(f,dataMat1,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('/tmp/test2.txt','w') as f:
    np.savetxt(f,dataMat2,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('/tmp/test3.txt','w') as f:
    np.savetxt(f,dataMat3,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('/tmp/test4.txt','w') as f:
    fmt = ' '.join(['%g']*dataMat3.shape[1])
    fmt = '\n'.join([fmt]*dataMat3.shape[0])
    data = fmt % tuple(dataMat3.ravel())        
    f.write(data)
end = time.perf_counter()
print(end-start)

reports

0.1604848340011813
0.17416274400056864
0.6634929459996783
0.16207673999997496

hpaulj · Answer 2 · 2018-12-17T18:27:28.027

4

The code for savetxt is Python and accessible. Basically it does a formatted write for each row/line. In effect it does

for row in arr:
   f.write(fmt%tuple(row))

where fmt is derived from your fmt and shape of the array, e.g.

'%g %g %g ...'

So it's doing a file write for each row of the array. The line format takes some time as well, but it's done in memory with Python code.

I expect loadtxt/genfromtxt will show the same time pattern - it takes longer to read many rows.

pandas has a faster csv load. I haven't seen any discussion of its write speed.

edited Dec 17 '18 at 18:27

answered Dec 17 '18 at 18:19

hpaulj

221,503
14
230
353

How does this answer the question asked? – Scott Hunter Dec 17 '18 at 18:20
1

@ScottHunter, it answers the `why` question, doesn't it? – hpaulj Dec 17 '18 at 18:21
It answers it *now*. – Scott Hunter Dec 17 '18 at 18:56
Just a comment, I find pandas write speed is actually slower than numpy. `np.save()` is an order of magnitude faster than either, if a csv format is not necessary. – kevinkayaks Mar 05 '19 at 00:19

Speed of writing a numpy array to a text file

2 Answers2

Linked