1

Possible Duplicate:
Deleting a specific line in a file (python)

I need to delete the line contains number '2' from the file f=

2 3
5 6
7 2
4 5
Community
  • 1
  • 1
graph
  • 389
  • 2
  • 5
  • 10

3 Answers3

5

When you want to edit a file, you make a new file with the correct data and then rename the new file as the old file. This is what serious programs like your text editor probably do. (Some text editors actually do even weirder stuff, but there's no use going into that.) This is because in many filesystems the rename can be atomic, so that under no circumstances will you end up with the original file being corrupted.

This would lead to code to the effect of

with open(orig_file) as f, open(working_file, "w") as working: 
    # ^^^ 2.7+ form, 2.5+ use contextlib.nested
    for line in f:
        if '2' not in line: # Is this exactly the criterion you want?
                            # What if a line was "12 5"?
            working.write(line)

os.rename(working_file, orig_file)

You may want to use orig_file + '~' or the tempfile module for generating the working file.

Mike Graham
  • 73,987
  • 14
  • 101
  • 130
2
with open('f', 'r+') as f:
  data = ''.join(filter(lambda l: '2' not in l.strip().split(' '), f))
  f.seek(0)
  f.truncate(0)
  f.write(data)
phihag
  • 278,196
  • 72
  • 453
  • 469
  • data = ''.join(l in f if '2' not in l.split(' ')) ^ SyntaxError: invalid syntax – graph Sep 01 '11 at 21:08
  • @ammar Refresh the page ;). But indeed, that version misses a `for l in f`: `data = ''.join(l for l in f if '2' not in l.split(' '))`. – phihag Sep 01 '11 at 21:13
  • data = ''.join(lambda l: '2' not in l.strip().split(' '), f) TypeError: join() takes exactly one argument (2 given) – graph Sep 01 '11 at 21:36
  • @ammar I'm blaming the late hour here. Corrected, somehow I forgot the `filter` call when I copied it from my shell to stackoverflow. Sorry. – phihag Sep 01 '11 at 21:38
  • Note that this method allows the file to be corrupted due to unexpected termination. – Mike Graham Sep 01 '11 at 21:38
  • @Mike Graham Yup, +1 to your solution. However, if you really care about data security, you should [call `fsync`](http://stackoverflow.com/questions/705454/does-linux-guarantee-the-contents-of-a-file-is-flushed-to-disc-after-close). – phihag Sep 01 '11 at 21:42
  • @phihag, I think I disagree, but I'm not an expert on this issue. It's the OS's problem what is written to actual disk and what isn't, and `os.rename` should be fine so long as the file has been flushed, which it will be when it's closed, and hence the user part of maintaining data integrity is met. Any decent filesystem will be designed to ensure the data integrity is maintained when it writes data to disk. Is there a networked filesystem issue or something I'm not familiar with that is a threat? – Mike Graham Sep 01 '11 at 22:01
  • @Mike Graham `(...) flushed, which it will be when it's closed`. This assumption is false as described in the [question I linked](http://stackoverflow.com/questions/705454/does-linux-guarantee-the-contents-of-a-file-is-flushed-to-disc-after-close) – phihag Sep 01 '11 at 22:24
  • @phihag, The question you linked is not addressing the issue at hand. Python *calls* `flush` before closing a file, making the `os.rename` operation completely sensible. If there is a threat to the data integrity -- an opportunity for data to be corrupted -- in my example code, can you please point it out? – Mike Graham Sep 01 '11 at 22:39
  • @phihag, The question you linked is not addressing the issue at hand. Python *calls* `flush` before closing a file, making the `os.rename` operation completely sensible. If there is a threat to the data integrity -- an opportunity for data to be corrupted -- in my example code, can you please point it out explicitly? – Mike Graham Sep 01 '11 at 22:39
  • @Mike Graham If Python calls `flush`(the python method, which presumably calls `fsync`), your code is perfectly fine. However, I can't find any indication that it does. A strace with cpython 2.6 shows `open("fileout", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 close(3) = 0`. Can you cite a source or explain where in the cpython code Python *does* call `flush` before closing a file? – phihag Sep 02 '11 at 06:42
  • @phihag, http://docs.python.org/library/io.html#io.IOBase.close documents all filelike objects' `close` as flushing the file. `file.flush` calls `fflush` though, not `fsync`. `fsync` is a lower-level command that demands something about the physical disk. `fflush` just makes sure the OS has received the data and does not take any position on whether the data are in the OS cache or on the physical disk. – Mike Graham Sep 02 '11 at 11:51
  • @Mike Graham Oh, I see we talked about different things. User-space level flushing is of course always done on close - any system not doing that would have to use threads to get rid of the buffer otherwise. With "flushed", I meant flushed to the disk (or as close as possible), so that the data is still consistent even if the power fails. That's the reason for using a temporary file in the first place, isn't it? – phihag Sep 02 '11 at 11:58
  • @phihag, It's not normally any of the user's business if the file has been physically written to disk. It's the filesystem's responsibility to ensure that there is no corruption on the disk. It's the user's responsibility to ensure that there is no corruption on the filesystem. My code doesn't make the promise that when you're done, the file will be on disk, which user-level programs usually should not. It makes the promise that a half-processed file will not be written to the filesystem. It assures that the processing will be an all-or-nothing operation. – Mike Graham Sep 02 '11 at 12:55
  • @phihag, Even if I was calling `fsync`, that does nothing really to ensure that the data written to disk isn't corrupted by power or other failure. (For the syncing-to-disk operation the main danger is power failure, but for the problem I'm solving, there are any number of failure modes -- running into an unhanded exception, getting killed, etc -- which do not lead to data corruption which would working in-place. – Mike Graham Sep 02 '11 at 13:02
-1
import fileinput
for line in fileinput.input('f',inplace =1):
  line = line.strip()
  if not '2' in line:
    print line 
agf
  • 171,228
  • 44
  • 289
  • 238
graph
  • 389
  • 2
  • 5
  • 10