Major Update: Modified to use proper code for reading in a preprocessed array file (function using_preprocessed_file()
below), which dramatically changed the results.
To determine what method is faster in Python (using only built-ins and the standard libraries), I created a script to benchmark (via timeit
) the different techniques that could be used to do this. It's a bit on the longish side, so to avoid distraction, I'm only posting the code tested and related results. (If there's sufficient interest in the methodology, I'll post the whole script.)
Here are the snippets of code that were compared:
@TESTCASE('Read and constuct piecemeal with struct')
def read_file_piecemeal():
structures = []
with open(test_filenames[0], 'rb') as inp:
size = fmt1.size
while True:
buffer = inp.read(size)
if len(buffer) != size: # EOF?
break
structures.append(fmt1.unpack(buffer))
return structures
@TESTCASE('Read all-at-once, then slice and struct')
def read_entire_file():
offset, unpack, size = 0, fmt1.unpack, fmt1.size
structures = []
with open(test_filenames[0], 'rb') as inp:
buffer = inp.read() # read entire file
while True:
chunk = buffer[offset: offset+size]
if len(chunk) != size: # EOF?
break
structures.append(unpack(chunk))
offset += size
return structures
@TESTCASE('Convert to array (@randomir part 1)')
def convert_to_array():
data = array.array('d')
record_size_in_bytes = 9*4 + 16*8 # 9 ints + 16 doubles (standard sizes)
with open(test_filenames[0], 'rb') as fin:
for record in iter(partial(fin.read, record_size_in_bytes), b''):
values = struct.unpack("<2i5d2idi3d2i3didi3d", record)
data.extend(values)
return data
@TESTCASE('Read array file (@randomir part 2)', setup='create_preprocessed_file')
def using_preprocessed_file():
data = array.array('d')
with open(test_filenames[1], 'rb') as fin:
n = os.fstat(fin.fileno()).st_size // 8
data.fromfile(fin, n)
return data
def create_preprocessed_file():
""" Save array created by convert_to_array() into a separate test file. """
test_filename = test_filenames[1]
if not os.path.isfile(test_filename): # doesn't already exist?
data = convert_to_array()
with open(test_filename, 'wb') as file:
data.tofile(file)
And here were the results running them on my system:
Fastest to slowest execution speeds using Python 3.6.1
(10 executions, best of 3 repetitions)
Size of structure: 164
Number of structures in test file: 40,000
file size: 6,560,000 bytes
Read array file (@randomir part 2): 0.06430 secs, relative 1.00x ( 0.00% slower)
Read all-at-once, then slice and struct: 0.39634 secs, relative 6.16x ( 516.36% slower)
Read and constuct piecemeal with struct: 0.43283 secs, relative 6.73x ( 573.09% slower)
Convert to array (@randomir part 1): 1.38310 secs, relative 21.51x (2050.87% slower)
Interestingly, most of the snippets are actually faster in Python 2...
Fastest to slowest execution speeds using Python 2.7.13
(10 executions, best of 3 repetitions)
Size of structure: 164
Number of structures in test file: 40,000
file size: 6,560,000 bytes
Read array file (@randomir part 2): 0.03586 secs, relative 1.00x ( 0.00% slower)
Read all-at-once, then slice and struct: 0.27871 secs, relative 7.77x ( 677.17% slower)
Read and constuct piecemeal with struct: 0.40804 secs, relative 11.38x (1037.81% slower)
Convert to array (@randomir part 1): 1.45830 secs, relative 40.66x (3966.41% slower)