I'm loading a CSV file (if you want the specific file, it's the training csv from http://www.kaggle.com/c/loan-default-prediction). Loading the csv in numpy takes dramatically more time than in pandas.
timeit("genfromtxt('train_v2.csv', delimiter=',')", "from numpy import genfromtxt", number=1)
102.46608114242554
timeit("pandas.io.parsers.read_csv('train_v2.csv')", "import pandas", number=1)
13.833590984344482
I'll also mention that the numpy memory usage fluctuates much more wildly, goes higher, and has significantly higher memory usage once loaded. (2.49 GB for numpy vs ~600MB for pandas) All datatypes in pandas are 8 bytes, so differing dtypes is not the difference. I got nowhere near maxing out my memory usage, so the time difference can not be ascribed to paging.
Any reason for this difference? Is genfromtxt just way less efficient? (And leaks a bunch of memory?)
EDIT:
numpy version 1.8.0
pandas version 0.13.0-111-ge29c8e8