I am trying to parse a huge csv file (around 50 million rows) using Pandas 'read_csv' method.
Below is the code snippet I am using:
df_chunk = pd.read_csv(db_export_file, delimiter='~!#', engine='python', header=None, keep_default_na=False, na_values=[''], chunksize=10 ** 6, iterator=True)
Therafter using the pd.concat
method I am getting the whole set of dataframe which is used for further processing.
Everything is working fine instead, the read operation from that csv file takes almost 6 mins to create the dataframe.
My question is that, is there any other way to make this process faster using the same module and method?
Below is the sample data presented as a csv file
155487~!#-64721487465~!#A1_NUM~!#1.000
155487~!#-45875722734~!#A32_ENG~!#This is a good facility
458448~!#-14588001153~!#T12_Timing~!#23-02-2015 14:50:30
458448~!#-10741214586~!#Q2_56!#
Thanks in advance