Converting large lists of dicts to float32 DataFrame

Question

I am trying to use python to turn binary files into pandas DataFrames for easy subsetting and data analysis. My package works, but only for small files ('small' meaning ~500mb). A workable example of the final bits of the code is shown below:

import pandas as pd

list_of_dicts = [{'a': 1, 'b': 2, 'c': 3},{'a': 1, 'b': 2, 'c': 3},{'a': 1, 'b': 2, 'c': 3}]
output = pd.DataFrame(list_of_dicts)   # Memory error occurs here for large files

I can reduce the size of the DataFrame by about 40-50% using .astype('float32'), but I need the dtype to be set to float32 before the DataFrame is built, not afterwards since the memory error occurs during the creation of the DataFrame. Is there a way of changing the default dtype of pd.DataFrame() to use float32 instead of float64 and int64?

If you are hitting the memory limit, you should considere processing less data or not using pandas... Any further processing could raise the error. — Serge Ballesta, Mar 16 '20 at 15:23
https://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas — AMC, Mar 16 '20 at 21:09
Currently I'm opening the file with `read`, storing a line of information in a dict, appending the dict to a list, then looping to the next line. This gives a list of dicts where each key:value pair in the dict corresponds to a column name and value, and each member of the list corresponds to a row of data. I then call `pd.DataFrame()` on it and specify the columns I want. The reason I do this is that I need to be able to subset and modify columns of data, as well as extract subsets of columns for fitting and making graphs. — Sal, Mar 17 '20 at 07:16

Converting large lists of dicts to float32 DataFrame

0 Answers0