I’m new to programming and C and I'm currently working through K&R. Apologies in advance if this isn't the most succinct way of characterizing the problem.
For context, in section 8.6 of K&R (not the exercises but the actual chapter) they implement the function fsize() that prints out the size of files in a directory and its sub-directories recursively. The code in the book uses the syscall read() to implement a basic version of readdir(), which returns a pointer to the next entry in a directory.
Up until this section of K&R, all source code has worked fine on my machine, however, the code in this chapter relies on using the read() function on directories to get their contents, which according to a source I found here [1], doesn’t work on Linux and many modern systems.
However, there exists a syscall getdents() which seems to do roughly same thing [2]. So as an exercise I tried to re-implement readdir() and came across the following problems:
- read() on directories seems to know in advance the size of each entry, so it's able to return one entry at a time and let read() handle the issue of "remembering" the location of the next entry every time it is called.
- getdents() on the other hand doesn't know the size of each entry in advance, so I have to read the entire buffer first and then loop through it in the readdir() using the member d_reclen (I copied from the example at the bottom of man getdents), meaning now my readdir() function has to handle the issue of "remembering" the location of the next entry in the stream every time readdir() is called.
So my questions are as follows:
- Am I correct in my understanding that getdents() cannot be made to behave like read() in the sense that it can read one entry at a time and handle the "remembering of the next position"?
- If it is true that getdents() cannot behave like read(), what is the best way to implement "remembering position", in particular if getdents() need to be called multiple time on several sub-directories? I've shown an excerpt of what I tried below: using the file descriptor assigned by the system as a way of indexing the results of getdents() in an array. However this attempt seems to fail given how opendir() and closedir() are implemented — the system will reassign file descriptors once closedir() has been called and opendir() is called on the next subdirectory (and this information is not available to readdir()).
Last Note: I want my implementation of read_dir() to behave exactly like that of readdir() in K&R. Meaning I wouldn't have to change any of the other functions or structures to make it work.
// NTD: _direct's structure needs to match how system implements directory
// entries. After reading from file descriptor into _direct, we then
// copy only the relevant elements (d_ino and d_name) to Dirent
struct _direct { // directory entry
long d_ino; // inode number
off_t d_off; // Not included in K&R
unsigned short d_reclen; // Not included in K&R
char d_name[]; // long name does not have '\0'
};
#define BUFSIZE 4096 // Size of buffer when reading from getdents()
#define MAXFILES 1024 // Max files that read_dir() can open
struct _streamdents {
int pos;
int nread;
char *buf;
};
// read_dir: read directory entries in sequence
Dirent *read_dir(_dir *dp)
{
struct _direct *dirbuf; // local directory structure
static Dirent d; // return: portable structure
static struct _streamdents *readdents[MAXFILES];
if (dp->fd > MAXFILES - 1) {
printf("Error in read_dir: Cannot continue reading, too many directories\n");
return NULL;
}
// Check if directory has already been read; if not, create stream.
// Important if fxn is called for a sub-directory and then needs
// to return to a parent directory and continue reading.
if (readdents[dp->fd] == NULL) {
char *buf = malloc(BUFSIZE);
int nread = syscall(SYS_getdents, dp->fd, buf, BUFSIZE);
int pos = 0;
struct _streamdents *newdent = malloc(sizeof(struct _streamdents));
newdent->buf = buf;
newdent->pos = pos;
newdent->nread = nread;
readdents[dp->fd] = newdent;
}
struct _streamdents *curdent = readdents[dp->fd];
int pos = curdent->pos;
int nread = curdent->nread;
char *buf = curdent->buf;
while (pos < nread) {
dirbuf = (struct _direct *) (buf + pos);
if (dirbuf->d_ino == 0) // slot not in use
continue;
d.ino = dirbuf->d_ino;
strncpy(d.d_name, dirbuf->d_name, DIRSIZ);
curdent->pos += dirbuf->d_reclen;
return &d;
}
if (nread == -1) {
printf("Error in getdents(): %s\n", strerror(errno));
}
return NULL;
}
Thank you