2

So, I am in the situation where one process is continuously (after each few seconds) writing data to a file (not appending). The data is in the form of json. Now another process has to read this file at regular intervals. Now it could be that the reading process reads it while the writing process is writing to the file.

A soluition to this problem that I can think of is for the writer process to also write a corresponding checksum file. The reader process would now have to read both the file and its checksum file. If the calculated checksum doesn't match, the reader process would repeat the process until the calculated checksum matches. In this way, now it would know that it has read the correct data.

Or maybe a better solution is to read the file twice after a certain time period (much less than the writing interval of the writing process), and see if the read data matches.

The third way could be to write some magic data at the end of the file, so that the reading process knows that it has read the whole file, if it has encoutered that magic data at the end.

What do you think? Are these solutions viable, or are there better methods to achieve this?

MetallicPriest
  • 29,191
  • 52
  • 200
  • 356

2 Answers2

2

If you want to guarantee that the reader always gets all data, consider using a name pipe.

mkfifo ./jsonoutput

Then set one program to write to and the other program to read from this file ./jsonoutput.

So long as the writer is regularly closing and reopening the file after writing each JSON, the reader will get an EOF and process the input.

However if that isn't the case, the reader will just keep reading and the writer will just keep writing. If the programs aren't designed to handle streams of data like that, then they might just never process the data and the programs will hang.

If that's the case then you could write a program that reads from one named pipe until it gets a complete JSON and then flushes it through a second named pipe to the final program.

AKstat
  • 354
  • 3
  • 14
2

Create an entire new file each time, and rename() the new file once it's been completely written:

If newpath already exists, it will be atomically replaced, so that there is no point at which another process attempting to access newpath will find it missing. ...

Some copy of the file will always be there, and it will always be complete and correct:

So, instead of

writeDataFile( "/path/to/data/file.json" );

and then trying to figure out what to do in the reader process(es), you simply do

writeDataFile( "/path/to/data/file.json.new" );
rename( "/path/to/data/file.json.new", "/path/to/data/file.json" );

No locking is necessary, nor any reading of the file and computing checksums and hoping it's correct.

The only issue is any reader process has to open() the file each time it needs to read the latest copy - it can't keep and open file descriptor on the file and try to read new contents as the rename() call unlinks the original file and replaces it with an entirely new file.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • That is a very clever solution :)! – MetallicPriest Jan 08 '19 at 11:41
  • @MetallicPriest Thanks. Just make **sure** the two files are in the same filesystem. Otherwise, if you're not doing the coding in C and making the `rename()` call directly, whatever does what's called `rename` or similar might actually *copy* the file contents, which is not what you want. You don't want to do what the Linux `mv` utility does, for example, which is try the `rename()` system call and if that fails, to fall back to copying the file contents. – Andrew Henle Jan 08 '19 at 11:46