I posted this question because I was wondering whether I did something terribly wrong to get this result.
I have a medium-size csv file and I tried to use numpy to load it. For illustration, I made the file using python:
import timeit
import numpy as np
my_data = np.random.rand(1500000, 3)*10
np.savetxt('./test.csv', my_data, delimiter=',', fmt='%.2f')
And then, I tried two methods: numpy.genfromtxt, numpy.loadtxt
setup_stmt = 'import numpy as np'
stmt1 = """\
my_data = np.genfromtxt('./test.csv', delimiter=',')
"""
stmt2 = """\
my_data = np.loadtxt('./test.csv', delimiter=',')
"""
t1 = timeit.timeit(stmt=stmt1, setup=setup_stmt, number=3)
t2 = timeit.timeit(stmt=stmt2, setup=setup_stmt, number=3)
And the result shows that t1 = 32.159652940464184, t2 = 52.00093725634724.
However, When I tried using matlab:
tic
for i = 1:3
my_data = dlmread('./test.csv');
end
toc
The result shows: Elapsed time is 3.196465 seconds.
I understand that there may be some differences in the loading speed, but:
- This is much more than I expected;
- Isn't it that np.loadtxt should be faster than np.genfromtxt?
- I haven't tried python csv module yet because loading csv file is a really frequent thing I do and with the csv module, the coding is a little bit verbose... But I'd be happy to try it if that's the only way. Currently I am more concerned about whether it's me doing something wrong.
Any input would be appreciated. Thanks a lot in advance!