I have a 200 million record file, which is being read using pandas read_csv in chunksize of 10000. These dataframes are being converted into a list object, and this list object is passed to a function.
file_name=str(sys.argv[2])
df=pd.read_csv(file_name, na_filter=False, chunksize=10000)
for data in df:
d=data.values.tolist()
load_data(d)
Is there any way load_data function call can be run parallelly, so that more than 10000 records can be passed to the function at the same time?
I tried using solutions mentioned in below questions:
But these don't work for me, as I need to convert the dataframe into list object first before calling the function.
Any help will be highly appreciated.