It's not really easy to do with read_excel
due to the function lacking any means of tracking progress, unlike read_csv
where you can use chunksize
parameter to return an interator and then you can update your progress based on how many chunks you've loaded out of the total.
This won't work in case of excel files so your only option is to use a combination of nrows
and skiprows
. To do this we first need to find out how many rows the file has. There are some neat ways of doing this with little overhead (see answers here) but these won't work for and Excel file so what you do is load one entire column of the sheet and find out the length that way:
# Get the total number of rows
df_temp = pd.read_excel(sheet_name='balbla', usecols=[0])
rows = df_temp.shape[0]
# Now load the file in chunks of 1000 rows at a time
chunks = rows//1000 + 1
chunk_list = []
for i in range(chunks):
tmp = pd.read_excel(sheet_name='blabla', nrows=1000, skiprows=[k for k in range(i*1000)])
chunk_list.append(chunks)
# Update progress
sg.OneLineProgressMeter('Loading excel file', i + 1, chunks)
df = pd.concat((f for f in chunk_list), axis=0)
This should work but will come with significant overhead. Unfortunately, there's no easy way to do this for excel files and you're better off using some other data format.