I have got a huge dataframe (pandas): 42 columns, 19 millions rows and different dtypes. I load this dataframe from a csv file to JupyterLab. Afterwards I do some operations on it (adding more colums) and I write it back to a csv file. A lot of the columns are int64. In some of these columns many rows are empty. Do you know a technique / specific dtype which I can apply on int64 columns in order to reduce the size of the dataframe and write it to a csv file more effient saving memory capacity and reduce the size of the csv file?
Would you provide me with some example of code? [For columns containing strings only I changed the dtype to 'category'.] thank you