I try to use multiprocessing
to read the csv file faster than using read_csv
.
df = pd.read_csv('review-1m.csv', chunksize=10000)
But the df
I get is not the dataframe
but of the type pandas.io.parsers.TextFileReader
. So I try to use
df = pd.concat(tp, ignore_index=True)
to convert df
into a dataframe
. But this process takes a lot of time thus the result is not much different from directly using read_csv
. Does anyone know that how to make the process of converting df
into dataframe
faster?