I am trying to use python to turn binary files into pandas DataFrames for easy subsetting and data analysis. My package works, but only for small files ('small' meaning ~500mb). A workable example of the final bits of the code is shown below:
import pandas as pd
list_of_dicts = [{'a': 1, 'b': 2, 'c': 3},{'a': 1, 'b': 2, 'c': 3},{'a': 1, 'b': 2, 'c': 3}]
output = pd.DataFrame(list_of_dicts) # Memory error occurs here for large files
I can reduce the size of the DataFrame by about 40-50% using .astype('float32')
, but I need the dtype to be set to float32 before the DataFrame is built, not afterwards since the memory error occurs during the creation of the DataFrame.
Is there a way of changing the default dtype of pd.DataFrame() to use float32 instead of float64 and int64?