I have two CSV files one is around 60 GB and other is around 70GB in S3. I need to load both the CSV files into pandas dataframes and perform operations such as joins and merges on the data.
I have an EC2 instance with sufficient amount of memory for both the dataframes to be loaded into memory at a time.
What is the best way to read that huge file from S3 to pandas dataframe?
Also after I perform the required operations on the dataframes the output dataframe should be re-uploaded to S3.
What is th best way of uploading the huge csv file to S3?