0

I have a 9000 row x 30 column Excel file with content I would like to copy from one Excel workbook to another. The issue I'm facing, however, is that when I run my code, it never stops running. What's more is that the memory usage keeps going up until it's used 90%-98% of my laptops available memory (screenshot).

I tried using timeit to see how slow my code was, but even that doesn't execute. I refuse to believe it's the size of my Excel file, so it must be an issue with my code. I am unable to pinpoint exactly what it could be though as I have very limited experience with openpyxl and would therefore greatly appreciate any advice. I'm using openpyxl 3.0.4 and Python 3.8.7.

def copy(source, destination):
    # Open source workbook
    wb1 = load_workbook(source)
    ws1 = wb1.worksheets[0]

    # Open destination workbook
    wb2 = load_workbook(destination)
    ws2 = wb2.active

    # Get rows and columns of source
    max_rows = ws1.max_row
    max_columns = ws1.max_column

    # Copy cells from source to destination
    for i in range(1, max_rows + 1):
        for j in range(1, max_columns + 1):
            # Read from source
            c = ws1.cell(row=i, column=j)

            # Write to destination file
            ws2.cell(row=i, column=j).value = c.value

    # Save source
    wb2.save(str(destination))

copy_src = r"C:\Users\username\Documents\copy_from.xlsx"
copy_dst = r"C:\Users\username\Documents\copy_to.xlsx"
time = timeit.timeit('%s' % copy(copy_src, copy_dst))
print("Execution time for copy(): %f seconds" % time)
bartek15
  • 7
  • 3
  • You can visit this:- https://stackoverflow.com/questions/44593705/how-to-copy-over-an-excel-sheet-to-another-workbook-in-python – Kishan Yadav Jan 13 '21 at 09:27
  • I have tried the solution given in the answer, but it does not work. Same thing happens. Memory goes up, it never really finishes. – bartek15 Jan 13 '21 at 09:38
  • 9000x30 is a *small* sheet. A sheet can have up to 1M rows. What's the point of this code though? What does `ws1.max_row` return? Are you trying to copy 1M rows perhaps? Why not use `for row in ws1: for cell in row:` ? – Panagiotis Kanavos Jan 13 '21 at 09:39
  • I tried using the solution given above by @KishanYadav, which uses `for row in ws1: for cell in row:` but that didn't seem to fix my problem. As for what `ws1.max_row` returns, it seems like even simple code like this: https://pastebin.com/kEUXQDvG won't run. This suggests that there is something wrong with the Excel workbook, but I am unsure of what. I tried clearing the blanks like one of the answers suggested, but that didn't fix the issue. – bartek15 Jan 13 '21 at 10:11
  • I have tried downloading the following .csv file (converted it to .xlsx): https://data.cityofnewyork.us/Business/Legally-Operating-Businesses/w7w3-xahh And running the code posted above. It still takes a while to finish executing (still doesn't display max_rows), but despite that timeit returns `0.007179` which seems odd to me. – bartek15 Jan 13 '21 at 10:33
  • The following code (https://pastebin.com/iKvqDiar) works for the workbook downloaded above, but not for mine (which is only 9000 rows compared to the 250k+ rows in the downloaded one) and I'm unsure of why. – bartek15 Jan 13 '21 at 10:45

2 Answers2

1

The worksheet is not large. If the workbook is 261 MB (please include this information in the description) then it must include a number of very large objects which openpyxl is reading into memory. For copying only read-only mode should be sufficient.

Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
0

How big is the Excel file? Not by row/column but by the size on disk? Whenever I've worked with openpyxl in the past I've always had issues with "empty, formatted cells", you may be able to see these more clearly by opening the excel file as XML: (How to view the XML form of an Excel file?).

Easiest way to fix is to open the source excel file (in Excel), select all blank cells and use the "Clear All" button in the tool bar or just hitting delete usually works as well.