I have written some python scripts that loads csv files with hundreds of thousands of rows into a database. It is working great but I was wondering if it is more memory efficient to use the csv module to extract the csv's as a list of lists than creating a pandas dataframe?
Asked
Active
Viewed 913 times
1 Answers
0
Pandas DataFrame is definitely more memory efficient than regular Python lists.
You should use Pandas.
Take look at slides from talk by Jeffrey Tratner Pandas Under The Hood
I'm just comparing a few key points between using pandas and lists approach:
- DataFrames have flexible interface. If you chose bare bones Pythons list approach you will need to create necessary functions by yourself.
- Many number crunching routines in pandas are implemented in C or by using specialized numerical libraries (Numpy) that will be always faster than code you will write in your lists
- Deciding to use lists will also mean that with large data lists memory layout will be downgrading performance as opposed to for Dataframe where data are split into blocks of the same types
- Pandas Dataframe has indexes which helps you easily lookup/combine/split data based on conditions you choose. Indexes are implemented in C and specialized for each data type.
- Pandas can easily read/write data to different formats
There are much more advantages that I probably don't even know about. The key point is: Don't reinvent the wheel, use right tools if you have them

Kamil Niski
- 4,580
- 1
- 11
- 24