1

I'm sorting a text file from Python using a custom unix command that takes a filename as input (or reads from stdin) and writes to stdout. I'd like to sort myfile and keep the sorted version in its place. Is the best way to do this from Python to make a temporary file? My current solution is:

inputfile = "myfile"
# inputfile: filename to be sorted
tmpfile = "%s.tmp_file" %(inputfile)
cmd = "mysort %s > %s" %(inputfile, tmpfile)
# rename sorted file to be originally sorted filename
os.rename(tmpfile, inputfile)

Is this the best solution? thanks.

  • 1
    it's a good solution because [rename is atomic](http://en.wikipedia.org/wiki/Rename_%28computing%29). – Janus Troelsen Jan 19 '13 at 00:36
  • Your `mysort` custom command is not reading from stdin with that command line; it is taking the filename from it's arguments instead. – Martijn Pieters Jan 19 '13 at 00:36
  • You can use the `subprocess` module to redirect the output from your command, so you use this output to override the contents of your initial file. No need to create temporary files then. – mmgp Jan 19 '13 at 00:37
  • @mmgp: but can you really redirect the output to the file as you're reading it? –  Jan 19 '13 at 00:37
  • `os.rename()` is portable. – Martijn Pieters Jan 19 '13 at 00:38
  • @user248237 it actually depends on how your command works. But I didn't mean to redirect the output to the initial file like that, first you collect the output. Then you write it over. – mmgp Jan 19 '13 at 00:39
  • @MartijnPieters: [os.replace](http://docs.python.org/3/library/os.html#os.replace) is more portable, os.rename would fail on windows. it even sounds like it's atomic on windows too. – Janus Troelsen Jan 19 '13 at 00:39

4 Answers4

4

If you don't want to create temporary files, you can use subprocess as in:

import sys
import subprocess

fname = sys.argv[1]
proc = subprocess.Popen(['sort', fname], stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
with open(fname, 'w') as f:
    f.write(stdout)
mmgp
  • 18,901
  • 3
  • 53
  • 80
1

You either create a temporary file, or you'll have to read the whole file into memory and pipe it to your command.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
1

The best solution is to use os.replace because it would work on Windows too.

This is not really what I regards as "in-place sorting" though. Usually, in-place sorting means that you actually exchange single elements in the lists without doing copies. You are making a copy since the sorted list has to get completely built before you can overwrite the original. If your files get very large, this obviously won't work anymore. You'd probably need to choose between atomicity and in-place-ity at that point.

If your Python is too old to have os.replace, there are lots of resources in the bug adding os.replace.

For other uses of temporary files, you can consider using the tempfile module, but I don't think it would gain you much in this case.

Janus Troelsen
  • 20,267
  • 14
  • 135
  • 196
  • Well, the in-place sorting should be done by his command. Since it doesn't, I took that part of the question as meaningless. – mmgp Jan 19 '13 at 00:50
0

You could try a truncate-write pattern:

with open(filename, 'r') as f:
   model.read(f)
model.process()
with open(filename, 'w') as f:
   model.write(f)

Note this is non-atomic

This entry describes some pros/cons of updating files in Python: http://blog.gocept.com/2013/07/15/reliable-file-updates-with-python/

sk8asd123
  • 1,665
  • 16
  • 14