New to pandas doing some progress with self learning, so I want the best and efficient way to handle this:
I have 3 sometimes more than 3 excel files ".xlsx" each one is about 100MB and at least 800K records per file and 200 columns.
The files share the same columns exactly, they are split because they were exported from a system that cannot handle all of them combined.
I want to load the files in one dataframe, opening each one and then concat
or append
I know it will depend on the memory of the machine but I am looking for the best way to handle those files and control them in a one frame.
This is what I have:
start = timeit.default_timer()
all_data = pd.DataFrame()
for f in glob.glob("./data/*.xlsx"):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
all_data
stop = timeit.default_timer()
execution_time = stop - start
print (execution_time)
With append it took about 7 minutes to load the files in the df all_data
Is there a best way to load them in a less time?