openpyxl to write a single xlsx files over 6306456 rows with multi-sheets

Question

import openpyxl

def read_excel():
    # Create a new workbook and add a worksheet
    workbook = openpyxl.Workbook()
    worksheet = workbook.create_sheet('Sheet1')

    # Use a while loop to generate and write the rows one at a time
    row = 1
    n = 1
    while row <= 6306456:  # 1048576*6+15000
        # If the row number is greater than 1,048,576, create a new worksheet
        # and reset the row number to 1

        c = row
        
        # if row > 1048576:
        if c > (1048576 * n):
            worksheet = workbook.create_sheet(f'Sheet{row // 1048576 + 1}')
            c = c - (1048576 * n)
            n = (row // 1048576) + 1

           
        # Write the row number to the first column of the worksheet
        worksheet.cell(row=c, column=1).value = row
        row += 1


        # Print the progress of the operation
        print(f'{row} of 6306456 rows written')

    # Save the workbook
    workbook.save('row_file.xlsx')

if __name__ == '__main__':
    # Call the read_excel function
    read_excel()

Above is the script,but it doesn't work,and it showed row is over 1048576.

I have 1048576*6+15000=6306456 rows of data to write in a single .xlsx file.

Could someome please help to fix the script when using openpyxl to create another new worksheets when a sheet reached 1048576 rows.

Therefore,the result is a .xlsx file with 7 worksheets to store more than 6306456 rows.

Thanks a lot.

Why are you trying to save 6M rows to an Excel file? Which *human* is going to use this? Excel isn't a database. You can save to multiple sheets but that makes sense when you want to save different data or you want to group the data by eg category, product, date. — Panagiotis Kanavos, Dec 07 '22 at 13:50
Where does the data come from and what does it contain? If you use Pandas, you can split the dataframe into chunks and save each to a different sheet. [These answers](https://stackoverflow.com/questions/44729727/pandas-slice-large-dataframe-into-chunks) show how to slice a big dataframe by size or fields. [This page from the XlsxWriter docs](https://xlsxwriter.readthedocs.io/example_pandas_multiple.html) shows how to save each slice to a different sheet — Panagiotis Kanavos, Dec 07 '22 at 13:55
I just wanna test openpyxl to split multi-sheets in a single xlsx file.Thanks — Stanley1, Dec 07 '22 at 13:58
Why? That's not "just". Excel files are used by humans, either to read data or for analysis. They aren't meant for data transfer. A human would have *real* trouble trying to combine 6 different sheets into a single pivot table. Applications and developers would hate such a file too, because they'd have to write extra code to iterate sheets — Panagiotis Kanavos, Dec 07 '22 at 13:58
The data source matters too. If the data source is Pandas, you can split with 1 line (`list_df = np.array_split(df, n)`)and save with 4 more lines, 1 to create an ExcelWriter, 2 to loop over `list_df` and write to the writer, and 1 to save the file. The sheet would still be very unwieldy. — Panagiotis Kanavos, Dec 07 '22 at 14:03
Dear Panagiotis Kanavos,could you please provide a example code for np.array_split? Thanks a lot. — Stanley1, Dec 07 '22 at 14:06
I just did - it's just that single line. Where *does* the data come from though? There are a *lot* of techniques that can be used. If you use Pandas, you could split the data while loading them by using `chunksize` for example. You wouldn't need `np.array_split`. Or you could group the dataframe by a specific column and get meaningful sheets — Panagiotis Kanavos, Dec 07 '22 at 14:08
But my script is first read excel with pandas,and then to_numpy,and when getting the data I want,then write it to new xlsx with openpyxl.chunksize has to be used when reading the original xlsx file. Not the result xlsx file? — Stanley1, Dec 07 '22 at 14:23
Could openpyxl writing cells be combined with pandas' chunksize function in the same time? — Stanley1, Dec 07 '22 at 14:29
Use a counter (`enumerate` springs to mind), together with the remainder operator (`%`) to create sheets and write to them. — Charlie Clark, Dec 07 '22 at 16:56

openpyxl to write a single xlsx files over 6306456 rows with multi-sheets

0 Answers0