I am looking at the source code of cat
from the GNU coreutils, in particular the circle detection. They are comparing device and inode and that works fine, there is however an extra case where they allow the output to be an input, if the input is empty. Looking at the code, this must the lseek (input_desc, 0, SEEK_CUR) < stat_buf.st_size)
part. I read the manpages and a discussion that I found from git blame
, but I still cannot quite understand why this call to lseek
is needed.
This is the gist of how cat
detects, if it would infinitely exhaust the disk (note that some error checks have also been removed for brevity, the full source code is linked above):
struct stat stat_buf;
fstat(STDOUT_FILENO, &stat_buf);
out_dev = stat_buf.st_dev;
out_ino = stat_buf.st_ino;
out_isreg = S_ISREG (stat_buf.st_mode) != 0;
// ...
// for <infile> in inputs {
input_desc = open (infile, file_open_mode); // or STDIN_FILENO
fstat(input_desc, &stat_buf);
/* Don't copy a nonempty regular file to itself, as that would
merely exhaust the output device. It's better to catch this
error earlier rather than later. */
if (out_isreg
&& stat_buf.st_dev == out_dev && stat_buf.st_ino == out_ino
&& lseek (input_desc, 0, SEEK_CUR) < stat_buf.st_size) // <--- This is the important line
{
// ...
}
// } (end of for)
I have two possible explanations, but both seem kind of weird.
- A file could be "empty" according to some standard (posix) although it still contains some information (that is counted with
st_size
) andlseek
oropen
respects that by offsetting by some default. I wouldn't know why this would be the case, because empty means empty, right? - This comparison is really a "clever" composition of two conditions. This made sense to me first, because if
input_desc
would beSTDIN_FILENO
and there wouldn't be a file piped tostdin
,lseek
would fail withESPIPE
(according to the man page) and return-1
. Then, this whole statement would belseek(...) == -1 || stat_buf.st_size > 0
. But this cannot be true, because this check only happens if device and inode are the same and that can only happen if a) stdin and stdout are pointing to same pty, but thenout_isreg
would befalse
or b) stdin and stdout point to the same file, but thenlseek
cannot return-1
, right?
I have also put together a small program that prints out the return values and errno
for the important parts, but there was nothing standing out to me:
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <unistd.h>
int main(int argc, char **argv) {
struct stat out_stat;
struct stat in_stat;
if (fstat(STDOUT_FILENO, &out_stat) < 0)
exit(1);
printf("this is written to stdout / into the file\n");
int fd;
if (argc > 1)
fd = open(argv[1], O_RDONLY);
else
fd = STDIN_FILENO;
fstat(fd, &in_stat);
int res = lseek(fd, 0, SEEK_CUR);
fprintf(stderr,
"errno after lseek = %d, EBADF = %d, EINVAL = %d, EOVERFLOW = %d, "
"ESPIPE = %d\n",
errno, EBADF, EINVAL, EOVERFLOW, ESPIPE);
fprintf(stderr, "input:\n\tlseek(...) = %d\n\tst_size = %ld\n", res,
in_stat.st_size);
printf("outsize is %ld", out_stat.st_size);
}
$ touch empty
$ ./a.out < empty > empty
errno after lseek = 0, EBADF = 9, EINVAL = 22, EOVERFLOW = 75, ESPIPE = 29
input:
lseek(...) = 0
st_size = 0
$ echo x > empty
$ ./a.out < empty > empty
errno after lseek = 0, EBADF = 9, EINVAL = 22, EOVERFLOW = 75, ESPIPE = 29
input:
lseek(...) = 0
st_size = 0
So my ultimate question is untouched from my research: How does lseek
help determine if a file is empty in this example from the cat
source code?