I have found a few similar questions here in Stack Overflow, but I believe I could benefit from advice specific for my case.
I must store around 80 thousand lists of real valued numbers in a file and read them back later.
First, I tried cPickle
, but the reading time wasn't appealing:
>>> stmt = """
with open('pickled-data.dat') as f:
data = cPickle.load(f)
"""
>>> timeit.timeit(stmt, 'import cPickle', number=1)
3.8195440769195557
Then I found out that storing the numbers as plain text allows faster reading (makes sense, since cPickle
must worry about a lot of things):
>>> stmt = """
data = []
with open('text-data.dat') as f:
for line in f:
data.append([float(x) for x in line.split()])
"""
>>> timeit.timeit(stmt, number=1)
1.712096929550171
This is a good improvement, but I think I could still optimize it somehow, since programs written in other languages can read similar data from files considerably faster.
Any ideas?