0

background: python, pandas, excel

there are 100 filse and type is .xlsx, and every file has 10000 rows and 50 cols; I want to concat them into one excel;

I has tried to concat them by pandas.concat() As shown below:

org_dir = 'D:/soft/project/excel' 
out_filepath = 'D:/soft/project/excel/concat_file.xlsx'

res_df = []
for file in os.listdir(org_dir):
    cur_df = pandas.read_excel(os.path.join(org_dir, file), 
                               dtype=str)
    res_df.append(cur_df)

concat_df = pandas.concat(res_df, ignore_index=True)
wr_concat = pandas.ExcelWriter(out_filepath, engine='openpyxl')
concat_df.to_excel(wr_concat, index=False)
wr_concat.close()

but it need more than 20 minutes; so have one greater solution to solve it ?

  • Welcome to Stack Overflow. Please do not include code in your question in the form of an image or via anything that requires that the reader follow an external link. Instead, you should include your code directly in your question, in text form, properly formatted. – CryptoFool Dec 02 '22 at 04:12
  • https://meta.stackoverflow.com/questions/285551/why-should-i-not-upload-images-of-code-data-errors – BOOOO Dec 02 '22 at 04:12
  • If you just need to concat the file that do not have connect logical in it. why not use ```cat``` or ```paste``` command of linux shell? – ElapsedSoul Dec 02 '22 at 07:33
  • Have you checked where the bottleneck is? Is the writing of the entire dataframe or the appending? – DonPre Dec 02 '22 at 11:29
  • @DonPre There are two bottlenecks here. the first : pandas.concat() will need more time; the second: df.to_excel() also need more time – zhugq_2012 Apr 24 '23 at 03:53

0 Answers0