There is a lot of documentation on the most efficient way to store pandas dataframes (e.g. How to store a dataframe using Pandas), but most of the resources focus on i/o time efficiency.
I would like to save large pandas dataframes, which typically use several Gb of disk storage in a csv
format, to a more lightweight format without losing any information.
The LightGBM Dataset looks promising, but I did not manage to correctly reload my data.
Any suggestions?