I am trying to find the best way to efficiently write large data frames (250MB+) to and from disk using Python/Pandas. I've tried all of the methods in Python for Data Analysis, but the performance has been very disappointing.
This is part of a larger project exploring migrating our current analytic/data management environment from Stata to Python. When I compare the read/write times in my tests to those that I get with Stata, Python and Pandas are typically taking more than 20 times as long.
I strongly suspect that I am the problem, not Python or Pandas.
Any suggestions?