I have a csv that I am reading into a Pandas DataFrame but it takes about 35 minutes to read. The csv is approximately 120 GB. I found a module called cudf
that allows a GPU DataFrame however it is only for Linux
. Is there something similar for Windows
?
chunk_list = []
combined_array = pd.DataFrame()
for chunk in tqdm(pd.read_csv('\\large_array.csv', header = None,
low_memory = False, error_bad_lines = False, chunksize = 10000)):
print(' --- Complete')
chunk_list.append(chunk)
array = pd.concat(chunk_list)
print(array)