I have imported a 14gig .csv file from Google Drive into Google drive and used pandas to sort it and also delete some columns and rows.
After deleting about a third of the rows and about half the columns of data, df_edited file.shape
shows:
(27219355, 7)
To save the file, the best method I've been able to find is:
from google.colab import files
df_edited.to_csv('edited.csv')
files.download('edited.csv')
When I run this, after a long time (if it doesn't crash which happens about 1 out of 2 times), it opens a dialog box to save the file locally.
I then say yes to the save and allow it to save. However, it reduces what was originally a 14 gig .csv file that I probably cut in half to about 7 gigs to a csv file of about 100 megs.
When I open the file locally it launches excel and I am only seeing about 358,000 observations instead of what should be about 27 million. I know Excel only shows you a limited amount but the fact that the size of the csv file has been shrunk to 100 megs suggests a lot of data has been lost in the download process.
Is there anything about the code above that would cause all this data to get lost?
Or what could be causing it.
Thanks for any suggestions.