1

If we need to read/write some data from/to a large file each time before/after processing, which of the following way (with some demonstration Python codes) is better?

  1. Open the file each time when we need to read/writing and close immediately after reading/writing. This way seems to be safer? but slower since we need to open and close a lot of times? for i in processing_loop: with open(datafile) as f: read_data(...) process_data(...) with open(resultfile,'a') as f: save_data(...) This looks awkward but it seems matlab takes this way in its .mat file IO functions load and save. We call load and save directly without explicit open nor close.

  2. Open the file and close until we finish all the work, faster but at the risk of file remaining open if the program raises errors, or the file being corrupted if the program is terminated unexpectedly. fr = open(datafile) fw = open(resultfile,'a') for i in processing_loop: read_data(...) process_data(...) save_data(...) fr.close() fw.close() In fact, I had several hdf5 files corrupted in this way when the program was killed.

Seems guys prefer the second with wrapping the loop in with.

 with open(...) as f:
     ...

or in an exception catch block.

I knew these two things and I did used them. But my hdf5 files were still corrupted when the program was killed.

  • Once I was trying to write a huge array into a hdf5 file and the program was stucked for a long time so I killed it, then the file was corrupted.

  • For many times, the program is ternimated because the server is suddenly down or the running time exceeds the wall time.

I didn't pay attention to if the corruption occurs only when the program is terminated while writing data to file. If so, it means the file structure is corrupted because it's incomplete. So I wander if it would be helpful to flush the data every time, which increase the IO loads but could decrease the chance of writing data to file when terminated.

I tried the first way, accessing the file only when reading/writing data is necessary. But obviously the speed was slow down. What happens in background when we open/close a file handle? Not just make/destroy a pointer? Why open/close operations cost so much?

wsdzbm
  • 3,096
  • 3
  • 25
  • 28
  • loading and saving .mat files is an altogether different beast. It's just saving variables from the workspace. The equivalent in python would be something like "shelve". (and in fact, scipy has a loadmat / savemat functionality too). I would understand reading and writing a file as discussed here to be more general than that. – Tasos Papastylianou Jul 21 '16 at 01:03
  • 1
    I am not clear why these are mutually exclusive? Why can't you just put the `for` loop inside the `with` block? And you can open multiple files with a single `with` statement. – TheBlackCat Jul 21 '16 at 01:37
  • @TasosPapastylianou Yes, we are talking about more general situations. I just metioned mat file as a practical example of the first case. @TheBlackCat The real code could be fairly long and including lots of nested incidents, I guess many people cannot bear such ugly style as me. And `with` does not help in case of unexpected termination. As I mentioned, I had hdf5 files corrupted in this way when the server got down and my program was killed. – wsdzbm Jul 21 '16 at 12:02

2 Answers2

0

You should wrap your code in solution 2 in a try except finally and always close the file in finally. This way even if there will be errors, your file will close itself.

EDIT: as someone else pointed out you can use with to handle that for you.

limbo
  • 684
  • 9
  • 18
  • 1
    Using a `with` block handles that automatically. – TheBlackCat Jul 21 '16 at 01:36
  • @TheBlackCat what's a `with` block, I haven't come across this before; is that like the "try-with-resources" statement in java? (i.e. essentially a try-catch block where resources opened are automatically closed?) – Tasos Papastylianou Jul 21 '16 at 12:10
  • @TasosPapastylianou `with` is python specific. I'm not sure if Java has it. It closes the resources when exits the block, no matter if there is an exception. – wsdzbm Jul 21 '16 at 12:34
  • @Lee yeah, so I guess it's the same concept (http://tutorials.jenkov.com/java-exception-handling/try-with-resources.html), making sure resources are closed properly. Except java forces you to use it in the context of a `try` block, whereas with `with` this seems to be optional. – Tasos Papastylianou Jul 21 '16 at 13:28
  • @TasosPapastylianou: It is similar, but not exactly the same. `while` blocks are used with something called a "context manager", which is a class that defines some specific startup and teardown methods. These methods are called reliably no matter what happens. They could open and close a file or other resource, but they don't have to. In practice, I see three big differences. First, the python context manager gets to see the exception (if any). Second, the setup does not have to be the same as the normal class setup. And third, no exceptions are suppressed by default. – TheBlackCat Jul 21 '16 at 15:48
0

If you are concerned about using multiple files within the "with" statement, you can open more than one file with a compound statement, or nest the "with" blocks. This is detailed in the answer here:

How to open a file using the open with statement

As for what happens when the program raises errors, that's what try/except blocks are for. If you know what errors are expected, you can easily surround your process_data() calls. Again, one except block can catch multiple exceptions.

https://docs.python.org/3/tutorial/errors.html#handling-exceptions

Community
  • 1
  • 1
  • Hi @sir_snoopalot , I know `with` and `try..finally..`. But they dont always work perfect. See my new edits for some explanations. BTW, the nested `with` in the link is new to me, better than nested `with` – wsdzbm Jul 21 '16 at 12:35