-1

I have two problems using openpyxl

  1. The number of rows in the spreadsheet are 1048498. The iteration hogs memory so I put a logic to check for first five empty columns and break from it

  2. Logic 1 works for me and code does not indefinitely iterate over the spreadsheet blank cells. I am using P4Python to delete this read only file after I am done reading it. However, openpyxl is still using that file and there is no method except save to close the archive used internally. Since my file is in read only mode, I cannot save the file. When P4 is trying to delete this file, I get this error - "The process cannot access the file because it is being used by another process."

Help is appreciated :)

Nick
  • 31
  • 11
  • Provide more information about memory use. – Charlie Clark Jun 27 '15 at 11:27
  • For your closing problem: http://stackoverflow.com/questions/11214908/closing-files-in-openpyxl have you tried deleting the file while commenting all openpyxl code? – Eric Levieil Jun 27 '15 at 11:38
  • For your problem with max rows, I presume your file has far less rows. This is a known issue: https://bitbucket.org/openpyxl/openpyxl/issue/278/get_highest_row-column-are-unreliable – Eric Levieil Jun 27 '15 at 11:40
  • For the closing problem I did read the other post you have put up, they say for reading (not saving) openpyxl closes the archive for you. However in my case, I am breaking out of the loop and when P4Python tries to delete that file, it is still being used by the process spawned by openpyxl. I am not sure I correctly understand deleting the file while commenting all openpyxl code. File delete has worked for me previously when I was working with just P4Python and after p4 sync "file depot path", I ran p4 sync "file depot path#0" and that did delete the file, at that time there was no openpyxl. – Nick Jun 29 '15 at 17:19
  • Yes max rows is a known problem, for which I added the logic to manually check if the first "X" columns are empty cells and break out of the reader loop. – Nick Jun 29 '15 at 17:24
  • @CharlieClark, memory is no longer being hogged as I am detecting end of file , last row by checking for empty values in first 'X' columns. I even tried using ws.calculate_dimension(force=True), it still gives me - 1048498 rows and if I run the row iterator for these many rows it will hog memory, hence I am breaking out of the loop. – Nick Jun 29 '15 at 17:27
  • @EricLevieil , I just tried commenting out all openpyxl code, and it deletes the file successfully. – Nick Jun 29 '15 at 17:40
  • Then see if it works outside perforce (opening the file in openpyxl then deleting it) ... – Eric Levieil Jun 29 '15 at 18:56
  • What are you doing with the cells when you get them? You're not running `ws.columns` by any chance? That **will** put all the cells in memory. – Charlie Clark Jun 29 '15 at 19:52
  • wb = load_workbook(filename=CLIENT_ROOT, read_only=True, use_iterators=True) staff_info = wb[ESCHER_STAFF_INFO_SHEET_NAME] for row in staff_info.iter_rows(row_offset=1): row[0].value I am using iterators and read only, once I get the cell using iter_rows, I am just accesing the value of the cell. I know that ws.columns hogs memory. – Nick Jun 29 '15 at 20:52
  • Can someone please remove the negative point on this question as it is a valid issue? – Nick Jul 23 '15 at 21:26

2 Answers2

0

If you open the file in read-only mode then it will not hog memory. Cells are created only when read. Memory use has been tested with huge files but if you think this is a bug then please submit a bug report with a sample file.

Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
0

This looks like an existing issue or intended beahvior with openpyxl. If you have a read only file (P4Python sync operation - p4.run_sync(file_path_to_sync)) and if you are reading it using openpyxl, you will not be able to delete the file (P4Python p4.run_sync(file_path_to_sync + '#0') - Remove from workspace) until you save the file which is not possible (or intended in my case) since it is a read only file.

Nick
  • 31
  • 11