3

This is really odd. I have a file temp.txt that has the following format:

   1        1:1        1:1         *0.9    0      0      0.1    0      0    
   2        1:1        1:1         *1      0      0      0      0      0    
   3        1:1        1:1         *1      0      0      0      0      0    
   4        1:1        2:2      +   0.2   *0.7    0.1    0      0      0    
   5        1:1        1:1         *1      0      0      0      0      0    
   6        1:1        1:1         *0.9    0      0      0.1    0      0    
   7        1:1        1:1         *1      0      0      0      0      0    
   8        1:1        1:1         *1      0      0      0      0      0    
   .        .          .           .       .      .      .      .      .
   .        .          .           .       .      .      .      .      .
   .        .          .           .       .      .      .      .      .
6593        1:1        1:1         *1      0      0      0      0      0    

The meaning of the numbers themselves is unimportant (it's WEKA output if anyone is curious). What I want is to take the number to the right of the second colon on each line and put them into a separate file classes.txt with one line for each number, as follows:

1
1
1
2
1
.
.
.

I wrote the following Python script to accomplish this:

initial = open('temp.txt')
final = open('classes.txt','w')
for line in initial:
    final.write(list(line.rsplit(':',1)[1])[0]+'\n') 

It works perfectly for the first 5462 lines, but for no apparent reason it stops there. The numbers from the remaining 1131 lines (5463 - 6593) are absent from classes.txt. I copied and pasted the skipped lines into a separate txt file and ran the script on THAT file but the resulting classes.txt was empty.

This problem is really stumping me because I can see no obvious difference between lines 5462 and 5463, shown below:

5461        1:1        1:1         *1      0      0      0      0      0    
5462        1:1        1:1         *1      0      0      0      0      0    
5463        1:1        4:4      +   0.3    0      0     *0.6    0.1    0    
5464        1:1        1:1         *0.8    0      0      0.2    0      0   

For the record, I altered the script to print the lines to the console and it did that just fine. The problem appears to be with writing those lines to the file. Any help would be greatly appreciated.

Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
StumpedCoder
  • 195
  • 1
  • 2
  • 7

3 Answers3

6

I discovered what was going on! I'm still not 100% sure why this fixed the issue (and why the issue occurred specifically at line 5462) but I added final.close() to the end of the script and then it wrote all 6593 numbers flawlessly. Strange.

StumpedCoder
  • 195
  • 1
  • 2
  • 7
  • 1
    I'm guessing your particular OS's implementation of file IO means that the file isn't completely written to disk until you call either `flush` or `close`. It's a performance thing; IO is kept in memory until a decent sized block can be written to disk, which is an expensive and slow operation, and works best in big chunks. – Morgan Harris Aug 09 '13 at 07:04
  • 5
    My assumption is that it's a buffering issue. Everything you write to a file is written through a buffer and will not end up in the file until that buffer is determined to be full (not entirely true, but you can think of it this way). You should really use the `with`-statement when dealing with files, which handles the `file.close`-call automatically. – msvalkon Aug 09 '13 at 07:04
  • 4
    It's not seemed strangely, such behaviour observed widely with not closed files, [see here](http://stackoverflow.com/questions/3167494/how-often-does-python-flush-to-a-file) and try to use `with ... as ...` construction. – rook Aug 09 '13 at 07:05
1

Can also confirm, foo.close() or close() even. Does solve that issue..

user2659890
  • 75
  • 1
  • 1
  • 6
0

You also can do this way

with open('temp.txt') as initial:
    with open('classes.txt','w') as final:
        for line in initial:
            final.write(list(line.rsplit(':',1)[1])[0]+'\n')

Now you don't have to concern about closing any files.

Joon
  • 45
  • 8