3

I am writing some data in a file using a "for" loop. I am running same code in my computer and in cluster. In my computer, data is being written for each loop while in cluster data is only written after the whole loop is finished. So, file size is increasing gradually in my computer while in cluster it remain zero until the whole loop is over. What could be the reasons?


sample code is:

file1=open("data.dat", "w")
for i in range(10000):
    file1.write('{0}\t {1}\n'.format(x[i], y[i]))

file1.close() 
200_success
  • 7,286
  • 1
  • 43
  • 74
SUV
  • 53
  • 6
  • By "cluster", I am assuming you are using a distributed file system. It seems that the issue would be that the DFS only flushes the file over the network on file close. There might be DFS-specific settings/commands to work around the issue. Which DFS are you using? – minghan Jan 17 '16 at 10:35
  • Thanks @200_success ..it was indeed a duplication of that question. Now the problem is solved. – SUV Jan 17 '16 at 10:50
  • 1
    This is because you don't `flush()` the file after each write. Python (well, the OS) will buffer your writes until enough data has accumulated before actually writing the data to disk. – Will Jan 17 '16 at 10:51

0 Answers0