1

I'm writing C code that takes its tasks to do from file (on linux). There are multiple processes in parallel writing to this file (using echo "COMMAND" >> "file_queue_input"). Until now this logic is used:

  rename (file_queue_input, file_queue_process);
  queue_file = fopen (file_queue_process,"r");
  while (!feof (queue_file))
  {
      <process file line by line>
  } // end of while reading queue file
  fclose (queue_file);
  remove (file_queue_process);

Idea was that by moving the file to another name new descriptors will open against the original name and all commands will be read. But reality shows that some commands are lost on the way.

My assumption: the write descriptor is opened before move operation, stays open to the new name, but not yet written its data, file is read without the data and then write descriptor writes the data, but because read loop has already finished there is nobody to read the data and file is deleted. How to prevent this?

What is correct way of reading queue file which is also frequently written to? I'm quite sure I'm not the first person interested in this, but can not find to correct question for google to find something useful.

As requested in comments added minimal example here (please note that I have all the checks, logging etc... but would like to keep the code as short as possible so everything what was not absolutely necessary was deleted): https://drive.google.com/open?id=1U9vh7DEUPopuyTJ5j4J8T8FqVj812AzV

It contain 2 files: Queue_read_test.c (file which I'm writing and have control of how to do it), bash_load_generator.sh which is file simulating writers where I have absolutely no control of.

First we check the numbers and run generator, when generator finish we run queue reader:

ice@center:/usr/src/Demo$ ./bash_load_generator.sh
All commands started waiting for finish
DONE
ice@center:/usr/src/Demo$ ./Queue_read_test # Stop with CTRL+C
^CFinished with 10000 queue items read: ice@center:/usr/src/Demo$

Then we run first queue reader and then we run generator in 2nd screen, after all finished and queue file is gone we stop queue reader (this is the real life scenatio):

ice@center:/usr/src/Demo$ ./Queue_read_test # Run in screen 1
ice@center:/usr/src/Demo$ ./bash_load_generator.sh # Run in screen 2
All commands started waiting for finish
DONE
ice@center:/usr/src/Demo$ ls -ltra /tmp/ | grep queue # Check all was processed (no file should be found) in screen 2
ice@center:/usr/src/Demo$
ice@center:/usr/src/Demo$ # return to screen 1 and CTRL+C
^CFinished with 9905 queue items read: ice@center:/usr/src/Demo$

And we see that 95 commands is lost. This is to illustrate the problem described, my question is now maybe thanks to your comments more precise: What can I as author of queue reader do to prevent loss of commands? (without possibility to modify queue writers) Can I check if there are open file descriptors (guess not as I'm not root).

  • Using a file lock would probably be easiest - unless you have some constraints that prevent you from using file locks ? Refer eg. [Locking files in linux with c/c++](https://stackoverflow.com/questions/2057784/locking-files-in-linux-with-c-c) – Sander De Dycker Mar 25 '20 at 12:45
  • _My Assumption:_ Why assume anything, its never a good idea. Knowing the prototype of [fopen](https://www.tutorialspoint.com/c_standard_library/c_function_fopen.htm) should prompt you to test its output before using it. i.e. `if(queue_file){ while(...){...}, fclose (queue_file); }` It would be good if your post included a [mcve] – ryyker Mar 25 '20 at 12:54
  • 1
    ...also, [using feof to control a loop is not a good idea](https://faq.cprogramming.com/cgi-bin/smartfaq.cgi?answer=1046476070&id=1043284351) – ryyker Mar 25 '20 at 12:58
  • @ryyker : fwiw the op's assumption was my guess as well. It's very likely that that's indeed what's going on. If so, checking for errors won't help, since there won't be any (all calls will succeed just fine - they just happen to be racing since there's no inter-process synchronization). – Sander De Dycker Mar 25 '20 at 13:05
  • A filename is just a property of the file in the directory entry. It has nothing to do with file descriptors. A file could be renamed while open; however, the OS may prevent it because the file is in use. – Paul Ogilvie Mar 25 '20 at 13:06
  • 1
    @SanderDeDycker - No argument with that. Removing other weak spots in the code shown though will help to isolate any remaining issues. Using feof as a loop control, without having checked to see that the file was open are two absolutes shown. that can be easily removed. – ryyker Mar 25 '20 at 13:12
  • Posted significant update with demo what is the problem (on my PC demo shows 95 commands lost from 10000) – Peter Novotny Mar 25 '20 at 14:23
  • You say you can't modify the writers, but are they at least using atomic writes when adding their commands to the "queue" ? Ie. there's no possibility of interleaved or partial commands ? If that's not the case, you can't achieve your goal. Secondly : commands appear to be split by newlines, so you can "tail" the file (without renaming it), and process commands line by line as you encounter newline characters. Obviously the file would then grow boundlessly, so some form of file rotation is still needed. Specify an upper bound to how long you keep waiting for new commands in old files. – Sander De Dycker Mar 25 '20 at 15:31
  • Oh, and "atomic writes" in my previous comment doesn't just imply a single `write` system call per command, but a re-open of the file (in append mode) for every write. – Sander De Dycker Mar 25 '20 at 15:34

1 Answers1

1

The overall issue really has less to do with the code shown, and more to do with architectural issues not covered sufficiently in the specific problems you have identified. Until the entire environment in which you are force to work is understood and addressed, the problems will persist. And although fixing the issues you cite will add improvement, they will not be sufficient to address these overlying problems.
In addressing the lower, code level problems you cite, the issues include the multiple processes you mentioned, accessing a non-re-entrant function. The function shown by itself has its own set of issues, but the one contributing to the main problem is that it is not thread safe. It needs to be protected externally either by thread safe queues or tokens , or internally by making it reentrant, thus ensuring a thread safe design. Plainly stated, without addressing the multi-threaded environment this function is working in, you are left with attempting to read from, and write to a single file simultaneously. This is a dangerous design.
Beyond this, and assuming your code will be modified to protect it in a thread safe environment, consider also two additional points that will improve the function shown.

Regarding the code shown...
It is not clear what is contained in the variable file_queue_process. It should be a valid filespec. (i.e. as described in link, and that the file it points to exists.)
Given that is true, the next potential issue is that you are attempting to (1) enter a poorly controlled loop (2) without first checking that the fopen() function returned a valid file handle. I would suggest using fgets() as the loop control as a much safer approach:

char line[80] = {0};
queue_file = fopen (file_queue_process,"r");
if(queue_file)
{
    while (fgets(line, 80, queue_file)
    {
        <process file line by line>
    } // end of while reading queue file
  fclose (queue_file);
}
remove (file_queue_process);
ryyker
  • 22,849
  • 3
  • 43
  • 87
  • This won't guarantee that `fgets()` doesn't return a partial line. Reading a file while it's being written is **hard**, and IMO a very, very, **VERY** bad design. Files are data stores, not message queues. – Andrew Henle Mar 25 '20 at 13:19
  • @AndrewHenle - I agree. I just addressed this in at the top of this post. The problem really has to do with hardening the function into thread safety (re-entrance), or protecting it externally using tokens or thread safe queues. – ryyker Mar 25 '20 at 13:37
  • @ryyker If it would be me writing the commands to the text file then the solution will be easy with the ques you mentioned, file locking, database... lot of improvements can be made. Problem is that I'm not in control of the writer processes. For me it is only possible to use modifications of the reader process in a way to not loose any lines from the commands file. – Peter Novotny Mar 25 '20 at 13:41
  • @AndrewHenle - If my edits have addressed your concern, could we please clean up the comments? Thanks. – ryyker Mar 25 '20 at 13:41
  • @PeterNovotny - The _writer process_ should have the ability to recognize that the file it is attempting to access is being temporarily blocked. If it does not, then this would be yet another architectural issue to handle. All you can do is look at the environment your function has to work in, (in this case multi-threaded access to a single resource) and provide code that will allow the resource to be shared _sequentially_, not _simultaneously_. – ryyker Mar 25 '20 at 13:45
  • @ryyker The problem is that there really is no way in this situation to prevent a partial read. After `fgets()` fails because it hits EOF, a writer process can append more data to the file. Even a `rename()` call can't prevent that. The only way to even begin to make this close to working is to do some sort of retry, and do it long enough that you can infer that no more data is going to appear in the file. Never mind using buffered IO calls such as `fgets()`. Even using low-level, OS-specific calls such as `open()` and `read()` isn't guaranteed to work. This is a **BAD** design. – Andrew Henle Mar 25 '20 at 13:49
  • And it's probably even worse. Even using `>>` so the writer processes append to the file doesn't ensure that the data from each process remains contiguous. Interspersed, mixed up-output is quite possible because the writer processes likely don't ensure that their output is done in single low-level write calls that would ensure that their output remains together. See https://stackoverflow.com/questions/32319031/zero-length-read-from-file for other issues. – Andrew Henle Mar 25 '20 at 13:55
  • @AndrewHenle - The _real_ problem here is stated in OP comment (6 up from this one) that explains he is _not in control of the writer process_. This means that OP cannot define how the _writer_ process_ handles a denied request to access the shared resource. And although I do not disagree with your _bad design_ assertion, I would disagree if you said the problems are unsolvable. A well written set of thread safe functions, including selecting better functions, e.g. `fopen` to `fopen_s`, or using external thread safe safe techniques, etc. etc. can be used to make this work. – ryyker Mar 25 '20 at 15:42
  • ALL, thanks for explanation. I added delay between move and open command. It get better, but not resolved (as expected). However this leads me to question: Is is possible in linux to know if there are any file descriptors open for given file (as non root user)? I do not need to know by whom, I do not need to know number, I need only yes/no as non privileged user. This would significantly help to resolve the issue. Any ideas? – Peter Novotny Mar 30 '20 at 18:40