I understand that one of the reasons why pandas can be relatively slow importing csv files is that it needs to scan the entire content of a column before guessing the type (see the discussions around the mostly deprecated low_memory
option for pandas.read_csv
). Is my understanding correct?
If it is, what would be a good format in which to store a dataframe, and which explicitly specifies data types, so pandas doesn't have to guess (SQL is not an option for now)?
Any option in particular from those listed here?
My dataframes have floats, integers, dates, strings and Y/N, so formats supporting numeric values only won't do.