With this Python script:
from __future__ import print_function
import time
import sys
import platform
if sys.version_info[0]==2:
range=xrange
times=[]
results=[]
t1=time.time()
t0=t1
tgt=5000000000
bucket=tgt/10
width=len('{:,} '.format(tgt))
with open('/tmp/disk_test.txt', 'w') as fout:
for line in range(1,tgt+1):
fout.write('Line {:{w},}\n'.format(line, w=width))
if line%bucket==0:
s='{:15,} {:10.4f} secs'.format(line, time.time()-t1)
results.append(s)
print(s)
t1=time.time()
else:
info=[platform.system(), platform.release(),sys.version, tgt, time.time()-t0]
s='\n\nDone!\n{} {}\n{} \n\n{:,} lines written in {:10.3f} secs'.format(*info)
fout.write('{}\n{}'.format(s, '\n'.join(results)))
print(s)
Under Python 2 and OS X, prints:
500,000,000 475.9865 secs
1,000,000,000 484.6921 secs
1,500,000,000 463.2881 secs
2,000,000,000 460.7206 secs
2,500,000,000 456.8965 secs
3,000,000,000 455.3824 secs
3,500,000,000 453.9447 secs
4,000,000,000 454.0475 secs
4,500,000,000 454.1346 secs
5,000,000,000 454.9854 secs
Done!
Darwin 13.3.0
2.7.8 (default, Jul 2 2014, 10:14:46)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
5,000,000,000 lines written in 4614.091 secs
Under Python 3.4 and OS X:
500,000,000 632.9973 secs
1,000,000,000 633.0552 secs
1,500,000,000 682.8792 secs
2,000,000,000 743.6858 secs
2,500,000,000 654.4257 secs
3,000,000,000 653.4609 secs
3,500,000,000 654.4969 secs
4,000,000,000 652.9719 secs
4,500,000,000 657.9033 secs
5,000,000,000 667.0891 secs
Done!
Darwin 13.3.0
3.4.1 (default, May 19 2014, 13:10:29)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
5,000,000,000 lines written in 6632.965 secs
The resulting file is 139 GB. You can see that on a relatively empty disk (my /tmp
path is a 3 TB volume) the times are linear.
My suspicion is that under Ubuntu, you are running into the OS trying to keep that growing file contiguous on an EXT4 disk.
Recall that both OS X's HFS+ and Linux's EXT4 file system use allocate-on-flush disc allocation schemes. The Linux OS will also attempt to actively move files to allow the allocation to be contiguous (not fragmented)
For Linux EXT4 -- you can preallocate larger files to reduce this effect. Use fallocate as shown in this SO post. Then rewind the file pointer in Python and overwrite in place.
You may be able to use the Python truncate method to create the file, but the results are platform dependent.
Something similar to (pseudo code):
def preallocate_file(path, size):
''' Preallocate of file at "path" of "size" '''
# use truncate or fallocate on Linux
# Depending on your platform, You *may* be able to just the following
# works on BSD and OS X -- probably most *nix:
with open(path, 'w') as f:
f.truncate(size)
preallocate_file(fn, size)
with open(fn, 'r+') as f:
f.seek(0) # start at the beginning
# write whatever
f.truncate() # erases the unused portion...