Say I have a CSV file which is 20TB.
Is there a way to load that into a data frame on a machine with only 16GB of memory ?
For example, what if I wanted to do:
data = pd.read_csv(csv_path)
data = data.drop_duplicates(content_column_name)
Say I have a CSV file which is 20TB.
Is there a way to load that into a data frame on a machine with only 16GB of memory ?
For example, what if I wanted to do:
data = pd.read_csv(csv_path)
data = data.drop_duplicates(content_column_name)